Case Study: Cloud Supercomputing from AWS Powers Yelp

August 11 2014

"The big thing we’re looking for is to get instant flexible stats on metrics related to engagement and usage. Then gaining the ability to, in real time, interactively play with different features to see how tweaking functionality or algorithms affects our business.”
—Jim Blomo, Engineering Manager, Yelp

Yelp Inc. is a data-driven company. From the early start-up days, through stellar growth and a successful IPO, the collection and analysis of data has played a central role for the San Francisco–based team. Their fine-grained insight into data enables them to identify new business opportunities, release new features, and evaluate the results. This flexibility and responsiveness to changing market conditions has been a key to their success.

Yelp provides local searches and recommendations to over 78 million visitors a month. To optimize the results users get, Yelp gleans insights from structured and unstructured data to adjust its products at both the tactical and strategic level.

For example, Yelp engineers routinely process 120 million unstructured site statistics a day to fine–tune their search ranking algorithms and deliver the most relevant results to their customers. Additionally, analyzing access logs allows Yelp to quickly identify new business opportunities, such as the growth in usage of mobile applications, and reallocate resources internally to build out engaging, innovative features for mobile customers.

To build these big data models, Yelp uses Amazon Web Services, which is built on the Intel^{^®} Xeon^{^®} processor E5 family. As they need to, any engineer at the company can quickly gain access to the machines they need to be able to process many terabytes of data at a time.

Using Amazon Elastic MapReduce, the team is able to access a powerful Hadoop* analytics service to navigate, experiment, and dive deep into their data. This freedom to innovate has resulted in some impressive gains. One such experiment resulted in click-through rates on searches increasing by 8 percent overnight, providing a more responsive, engaging experience for their users.

With access to simple, cost-effective supercomputing in the cloud, Yelp always has the power it needs to crunch the data and develop innovative new features for customers and business owners.

BIG DATA

Managing Change Key to Blending HPC, Big Data Operations...

When it comes to finding ways to maintain an edge in the competitive storage market, HGST, a Western Digital company that initially spun out of IBM and Hitachi, is counting on the cloud. But for HGST CIO, Steve Philpott, the initial challenge wasn't the technology — it was implementing the roadmap to increased flexibility and cost savings.Philpott led the “zero to cloud" charge at HGST in just over six months, helping the company move critical components of its daily operations, including its sales and internal business practices, to the cloud before. It was a prelude to a much broader initiative — one that would enhance the agility of their teams on the research and development front for a range of big data and high performance computing applications.From his experiences leading another company into the cloud frontier at Amylin Pharmaceuticals, which was one of the first companies in its field to move mission-critical, regulation-sensitive workloads into a virtualized environment, Philpott knew the speed, flexibility, and potential for optimization that would be possible for the diverse teams at HGST. However, his toughest job was convincing teams that had always been reliant on on-premises clusters that performance remain strong without adding more time and cost to their operations."This is probably the most exciting time to be in IT because we now truly have the tools and technology to enable business to move at a pace it could never move before," Philpott explained. However, he notes that in the end, even with a vast new opportunity on the tools front, the technology isn't the hardest part — it's the change management process that is the real challenge.For example, he explained how HGST's research division in Japan initially had a hard time seeing outside of the on-site HPC cluster. But with the relative ease of spinning up test environments to run compute-intensive molecular and fluid dynamics applications, they saw the performance and optimization capabilities of running inside Amazon Web Services (AWS). Their conclusion: Now they could focus their time on solving problems versus managing hardware.Within 16 weeks, the company's HPC division, including demanding high performance computing applications, were handled in the cloud, eliminating the need for an outright purchase of a new cluster while bolstering the team's ability to change at the pace of new ideas.Further, the much-needed computational resources were up and running four months earlier than if the team had waited for a new on-premises HPC cluster.The change management process was extended to include the daily demands of big data — particularly on the rigorous production side. With its range of devices, including new low-cost, high-capacity sealed helium-filled drives, HGST needed a feed of manufacturing data from several sites meshed with a wealth of other data from across business units. This led to the development of an entirely cloud-based big data platform as pictured below, which was able to seamlessly blend and orient product goals and targets with practice — and to help HGST remain agile enough to scale and optimize around challenges and opportunities. What you see below represents the end-to-end “DNA of a hard drive" as addressed at all hops by the AWS Cloud. Aside from the diversity of tools handled within the big data platform, the company optimized their AWS environment on the storage side, taking full advantage of the range of tiers available to manage data use and storage using an HSM approach."We have probably 30 plus terabytes in EBS, 300 terabytes or more in S3, and 3 to 5 petabytes in Glacier," said Philpott. "The key is that it's there with three years of data always on the ready, but having the tiers lets us optimize both performance and cost."While the specifics of the big data and HPC platforms themselves are fodder for articles richer in technical detail, the real story of HGST's success lies in implementing a solid change management strategy. It was hard to get users to think outside the box. But once implemented, the company is leaner, more nimble, and able to accelerate their pace of innovation.When it comes to finding ways to maintain an edge in the competitive storage market, HGST, a Western Digital company that initially spun out of IBM and Hitachi, is counting on the cloud. But for HGST CIO, Steve Philpott, the initial challenge wasn't the technology — it was implementing the roadmap to increased flexibility and cost savings.Philpott led the “zero to cloud" charge at HGST in just over six months, helping the company move critical components of its daily operations, including its sales and internal business practices, to the cloud before. It was a prelude to a much broader initiative — one that would enhance the agility of their teams on the research and development front for a range of big data and high performance computing applications.From his experiences leading another company into the cloud frontier at Amylin Pharmaceuticals, which was one of the first companies in its field to move mission-critical, regulation-sensitive workloads into a virtualized environment, Philpott knew the speed, flexibility, and potential for optimization that would be possible for the diverse teams at HGST. However, his toughest job was convincing teams that had always been reliant on on-premises clusters that performance remain strong without adding more time and cost to their operations."This is probably the most exciting time to be in IT because we now truly have the tools and technology to enable business to move at a pace it could never move before," Philpott explained. However, he notes that in the end, even with a vast new opportunity on the tools front, the technology isn't the hardest part — it's the change management process that is the real challenge.For example, he explained how HGST's research division in Japan initially had a hard time seeing outside of the on-site HPC cluster. But with the relative ease of spinning up test environments to run compute-intensive molecular and fluid dynamics applications, they saw the performance and optimization capabilities of running inside Amazon Web Services (AWS). Their conclusion: Now they could focus their time on solving problems versus managing hardware.Within 16 weeks, the company's HPC division, including demanding high performance computing applications, were handled in the cloud, eliminating the need for an outright purchase of a new cluster while bolstering the team's ability to change at the pace of new ideas.Further, the much-needed computational resources were up and running four months earlier than if the team had waited for a new on-premises HPC cluster.The change management process was extended to include the daily demands of big data — particularly on the rigorous production side. With its range of devices, including new low-cost, high-capacity sealed helium-filled drives, HGST needed a feed of manufacturing data from several sites meshed with a wealth of other data from across business units. This led to the development of an entirely cloud-based big data platform as pictured below, which was able to seamlessly blend and orient product goals and targets with practice — and to help HGST remain agile enough to scale and optimize around challenges and opportunities. What you see below represents the end-to-end “DNA of a hard drive" as addressed at all hops by the AWS Cloud. Aside from the diversity of tools handled within the big data platform, the company optimized their AWS environment on the storage side, taking full advantage of the range of tiers available to manage data use and storage using an HSM approach."We have probably 30 plus terabytes in EBS, 300 terabytes or more in S3, and 3 to 5 petabytes in Glacier," said Philpott. "The key is that it's there with three years of data always on the ready, but having the tiers lets us optimize both performance and cost."While the specifics of the big data and HPC platforms themselves are fodder for articles richer in technical detail, the real story of HGST's success lies in implementing a solid change management strategy. It was hard to get users to think outside the box. But once implemented, the company is leaner, more nimble, and able to accelerate their pace of innovation.…

Like Us

FEATURED

CLOUD

Cloud Innovation Heroes: Veda Woods

In this short Q&A series, Veda Woods, CISO and Deputy CIO of the Recovery Accountability and Transparency Board (RATB), shares her experiences with best practices when moving to the cloud. She addresses the pace and progress of cloud adoption for government agencies and identifies ways the government might be able to improve the procurement process for cloud services. ]]> In this short Q&A series, Veda Woods, CISO and Deputy CIO of the Recovery Accountability and Transparency Board (RATB), shares her experiences with best practices when moving to the cloud. She addresses the pace and progress of cloud adoption for government agencies and identifies ways the government might be able to improve the procurement process for cloud services. ]]>…

BIG DATA

Statistical Analysis with Open Source R and RStudi

R is an open source programming language and software environment designed for statistical computing, visualization, and data. Its flexible package system and powerful statistical engine prove especially helpful when processing large amounts of data. RStudio* is a free and open source integrated development environment for R. The RHadoop packages provide a simple and efficient approach to writing MapReduce code with R and high-level functionality to analyze big data located in a Hadoop* cluster. Read the Article ]]> R is an open source programming language and software environment designed for statistical computing, visualization, and data. Its flexible package system and powerful statistical engine prove especially helpful when processing large amounts of data. RStudio* is a free and open source integrated development environment for R. The RHadoop packages provide a simple and efficient approach to writing MapReduce code with R and high-level functionality to analyze big data located in a Hadoop* cluster. Read the Article ]]>…

TRENDING

1. Lifting Product Development and Testing to New Heights

2. A Rainbow Crosses the Genomic Cloud

3. 5 Companies Show Why Big Data Is Better In The Cloud

4. The AWS Pop-up Loft Returns

5. Case Study: Newsweek

6. Top 5 Ways the Cloud is Driving IT Convergence

7. Aerospace Tackles HPC Cloud's Biggest Questions

8. High Performance Computing in the Cloud with AWS and Cycle Computing

9. Condé Nast Goes All In with the AWS Cloud

10. Case Study: Cloud Supercomputing from AWS Powers Yelp