What is / Introduction, Advantages
A data lake fits into a modern data architecture. A centralized repository that allows you to store all your data in their raw formats (structured, semi structured, and unstructured) at any scale. Providing you the ability to run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions.
Data lakes are also good for organizations who want to gain insights into their data but doesn’t have a clear objective just yet. However, they would like to start their journey and have a repository to hold their information in a single and safe location until they understand what they need it for and therefore create a solution that meet their needs.
It is affordable, scalable, and secure.
Different types of analytics on your data such as SQL queries, big data analytics, full text search, real-time analytics, and machine learning can be used to uncover valuable insights.
Use Cases / why the need
Data lakes allow you to run event driven data strategies. Because data can be easily and quickly ingested, different analytical roles in your organization like data scientists, data developers, and business analysts can access the data and use it for optimal business purposes with their choice of analytics tools and frameworks
It reduces the constraints in data movement. Data can be imported at any rate in real-time, on-schedule or both. This process allows you to scale to data of any size, while saving time of defining data structures, schema, and transformations.
Data cataloging and security. With data lakes, you can store relational data from operational databases and data from line of business applications, and non-relational data coming from mobile apps, IoT devices, and social media.You can leverage advance analytics and machine learning to generate varying insights including reporting on historical data, build forecasting models, and provide predictive and prescriptive analysis.
Challenges
Data Swamp: These can easily occur because raw data is stored with no oversight for contents, lack of adequate data governance and data quality. Making it usable require defined mechanisms to catalog and secure so that data is not lost. Otherwise acquired data becomes untrustworthy and useless.
Performance: Performance can be hindered in a data lake when the size of data increases significantly and overtime without proper metadata management, data partitioning, and others.
Benefits
Data lakes are ideal data analytics platforms in the cloud because the cloud provides performance, scalability, reliability, availability, a diverse set of analytic engines, and massive economies of scale. Top reasons customers are migrating to cloud data lakes includes better security, faster time to deployment, better availability, more frequent feature/functionality updates, more elasticity, more geographic coverage, and costs linked to actual utilization.