Hello Everyone, This is my first article about data lake which is called as “Introduction to Data Lake” from data lake learning series name as “Learn Data Lake“.

In 2018, almost everyone knows or heard about Big Data. Big Data comes in the role when we will discuss huge amount of data which comes with high speed in different variety and don’t have always fixed data structure. But main questions come to mind as “Does big data generate meaning full information and informative value for us?” and the answer is NO.

The complexity of Big Data Project

As we discussed above, data is generating with high speed from several sources and in not-fixed structure, so it is getting so complex to get deep insight and get meaningful information. To handle these issue, we started the project as Big Data Project where we started to use multiple tools such as Hadoop, Kafka, Spark, NoSQL, MySQL which makes gradually more complex.

To handle all these, a new technology or you can say new term came to market between Big data players which called as “Data Lake”.

Introduction to Data Lake

A Data Lake is an enterprise-wide hyper-scale type of storage container which holds a huge amount of data in the structure, semi-structured or unstructured form of data. Data Lake enables users to ingest data from multiple sources, type, and ingestion speed in one single place for operational and exploratory analytics.

Major Principles for Data Lake

  1. Ingest Data 
    Ingestion of data means a collection of all kinds of data such as “marketing“, “social“, “logs“, “database” etc into a single system from various sources.
  2. Store Data
    Storing of data means storing all huge amount of data which comes in form of structure, unstructured and semi-structured form in single place to make sure all data should be present at one place and ready to use for other processes.
  3. Analyze Data
    Analyzing of data means to get an insight of huge amount of data and get meaningful information which user can use for their different purpose such as reporting, machine learning etc.
  4. Visualize Data
    Visualization of data means to get a report in form of charts, histogram etc format to give an overview of data.


In this article, we discuss about “introduction of data lake” in a simple way. In upcoming articles, we will go through other aspects about data lake.

