What is a data lake?
Data lakes are next-generation hybrid data management solutions that can meet big data challenges and drive new levels of real-time analytics. Their highly scalable environment can support extremely large data volumes and accept data in its native format from a wide variety of data sources. Data lakes can help break down silos, enabling organizations to gain 360-degree views of information and conduct cross-department, office or regional analytics. They also enable adoption of modern technologies such as artificial intelligence (AI) and the Internet of Things (IoT).
IBM and Cloudera
IBM and Cloudera, better together
Improve data discovery, testing, ad hoc and near real-time queries, supporting predictive and prescriptive analytics for today’s AI. Use a single ecosystem of products and services benefiting from the combined IBM and Cloudera collaboration and investment in the open source community.
Ladder to AI with IBM and Red Hat
Build your enterprise-grade, open AI data and analytic platform, harnessing machine learning and disparate data to drive better data-driven decisions. Benefit from industry-leading security and portability across your hybrid and multicloud environment when accessing, storing and exploring data.
Data lake industry use cases
Retail
• Determine what a customer is likely to purchase online and provide recommendations
• Identify a customer’s “path to purchase” to understand buying patterns and conduct micro-targeted marketing
• Predict or proactively identify fraudulent activity from both inside and outside the organization
Banking
• Predict the success or failure of discounts
• Pinpoint the “next product to buy” and promote that product to customers
• Identify which customers are likely to decrease their bank business and employ proactive marketing activities
Hospitality and travel
• Track and predict customer preferences to guide proactive selling
• Improve the customer experience and boost brand loyalty through customization and personalization
• Conduct real-time pricing and analysis
Data lake capabilities
Streamline data preparation and access
Reduce the time and cost spent on data preparation in a data lake that stores data in its original format. Use semi and unstructured data and provide users with the tools for real-time, self-service access necessary to drive AI and IoT.
Reduce IT and warehouse costs
Use commodity hardware when building your data lake to drive unlimited scalability and decrease capital expenditures. Save additional costs when using the data lake as a repository for older data that would otherwise take up capacity in a more expensive data warehouse.
Improve data-driven decisions
Federate and analyze data from more sources for deeper insights and more accurate results. Data lake governance features help ensure data is relevant and trustworthy. Coupled with real-time analytics and AI capabilities, the data lake allows your organization to seize new opportunities as they unfold.
IBM products for leveraging data lakes
IBM Db2 Big SQL
An enterprise-grade, hybrid, ANSI-compliant SQL, Db2 Big SQL delivers massively parallel processing (MPP) and advanced data query. Benefit from low latency, high performance, security, SQL compatibility, and federation capabilities for ad hoc and complex queries.
IBM Big Replicate
IBM Big Replicate provides fast, easy migration between Hadoop distributions (Cloudera and Hortonworks). Or you can replicate data across geographic locations, business application environments or cloud storage providers.
Data governance
Data governance solutions from IBM provide the overall management of data availability, relevancy, usability, integrity and security for the enterprise.
Data lake resources
Ebook: Build a better data lake
Learn about best practices and potential pitfalls when integrating a data lake in your existing data infrastructure. Understand the importance of enterprise-grade security and governance when using a growing diversity of data.
Infographic: Connect more data from sources with a data lake
Discover the new types and sources of data that can be used by integrating data lakes into your existing hybrid data management strategy. Data lakes allow you to tap into unstructured data and generate insights from real-time ad hoc queries and analysis.
Blog: Hortonworks/Cloudera merger
The January 2019 merger of Hortonworks and Cloudera is expected to shape the future market for big data and analytics. Read how the continued strategic partnership between Cloudera/Hortonworks and IBM can benefit our mutual customers.
Engage with an expert
Schedule a no-cost, one-on-one call with an experienced IBM expert
Learn about the IBM products, solutions and services available to help you build and grow a successful data lake.