Introduction
Wikipedia is the multilingual, Web-based, free encyclopedia that is produced collaboratively by volunteers. According to Alexa Traffic Rankings, Wikipedia consistently ranks in the Top Ten most-visited Web sites in the world. It hosts over 15 million articles in more than 260 languages. Every day, tens of millions of visitors learn more about their world — making nearly a half-million edits and creating thousands of new entries.
Their 'Growing' Challenge
Wikipedia's growth statistics are simply amazing. The organization has faced exponential growth on many fronts since its introduction in 2001. These include:
- Monthly visitor growth from less than 50,000 to over 350 million
- Content growth from less than 100 articles to over 15 million
- Contributor growth from less than 100 to over 300,000
Wikipedia expects the growth in content, contributor and user-base to continue in all directions - and needs a computing infrastructure that will keep the pace.
Their Scale-Out Solution
This phenomenal growth has put constant technical pressure on the performance and scalability of the system. Wikipedia is based on the LAMP stack (Linux, Apache, MySQL & PHP) and has grown from initially employing a single shared server to now being a Top Ten site, with more than 20 replicated database servers delivering up-to-date content to visitors. Additionally, lightweight MySQL instances are spread out on application servers as a distributed archive solution.
Wikipedia relies upon MySQL replication to scale-out their database infrastructure and accommodate more visitors, more articles and more contributors. This architecture also allows them to save significantly on hardware costs. Since they add new servers only on an incremental, as-needed basis, they can delay their new hardware purchases until more powerful machines drop to lower, commodity prices.