Hello Everyone, In this article, we will discuss migrate data from RDBMS to Elasticsearch using Logstash

Before jumping to the concept, let get a simple introduction about RDBMS, Elasticsearch, and Logstash. 

Introduction to RDBMS

  • A relational database management system (RDBMS) is a database management system (DBMS) based on the relational model invented by Edgar F. Codd at IBM’s San Jose Research Laboratory.
  • RDBMS is widely and mostly used database management system in the IT industry as of now. 

Introduction to Elasticsearch

  • Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. 
  • Elasticsearch lets you perform and combine many types of searches — structured, unstructured, geo, metric.
  • Elasticsearch is really fast. 
  • For details information about elasticsearch, please visit Elasticsearch website.

Introduction to Logstash

  • Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously transforms it, and then sends it to your favorite “stash.” such as Elasticsearch as well as other multiple sources too.
  • Logstash modules orchestrate a turnkey ingest-to-visualize experience with popular data sources like  ArcSight and Netflow.
  • For details information about logstash, please visit Logstash Website.

Now, let’s come to the concept of data transfer from RDBMS to Elasticsearch using Logstash.


Migrate data from RDBMS to Elasticsearch using Logstash

There are mainly 3 ways to achieve this process as:

  1. Using the Logstash JDBC input plugin – To ingest data from RDBMS to Elasticsearch directly.
  2. Using Kafka with JDBC connector – To ingest data in a streaming condition.
  3. Using Elasticsearch JDBC input plugin – For direct and simple data ingestion.

Before going to the next step, let’s check about pre-requisites for this process. 

Pre-requisites 

  1. Elasticsearch should be installed into the system and elasticsearch service should be in running mode.
  2. Logstash should be installed into the system.
  3. MySQL should be installed and in running mode.

Understand the structure of Logstash Config File.

input {
jdbc {
# Provide the path of JDBC Driver Library path
jdbc_driver_library => ""
# Provide the JDBC Driver Class Name
jdbc_driver_class => ""
# Provide the JDBC Connection String information.
jdbc_connecting_string => ""
# Provide the JDBC Username
jdbc_user => ""
# Provide the JDBC Connected User's Password.
# In case, no password, only put ""
jdbc_password => ""
}
}

output {
elasticsearch {
# Provide hosts IP address information
hosts => "localhost:9200"
# Give Index name in which data will get index.
index => ""
# Give Document type which comes under index.
document_type => "employee"
# Provide Elasticsearch User name. If not then you can remove this line.
user => ""
# Provide Elasticsearch User's password. If User is not available then
# remove this line.
password => ""
}
}

Understand this configuration file first:-

Logstash config file usually comes in 3 sections:-

  1. Input section where we have to provide information about our input table .
  2. Filter Section where we have to convert data as using mutate and all. This is optional.
  3. Output section where we have to provide Elasticsearch information such as host, index name etc.

Step 1: Create a Logstash config file

First of all, we need to create a Logstash config file which will give instruction to Logstash to get data from RDBMS and ingest into Elasticsearch. I am going to create a file as logstash-mysql.conf

Note:-

  • If you are using Windows then create this file in the logstash directory where rest zip contained folder are there. 
  • If you are using Linux, then please create the file in /etc/logstash/conf location.

Step 2: Provide information about Input Section

In the input section of a logstash config file, we have to provide information of input JDBC driver as well as other information.


input {
jdbc {
jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connecting_string => "jdbc:mysql://localhost:3306/testdb"
jdbc_user => "root"
jdbc_password => "nandan"
statement => "SELECT * from users where row_update_time>:sql_last_value"
tracking_column_type => "timestamp"
schedule => "* * * * *"
}
}

Details information about this Input section

  • jdbc_driver_library : used for MySQL, in case you are using Oracle, then you have to provide Oracle JDBC driver. I put my JDBC drive in same logstash config file location so, I didn’t provide the location.
  • jdbc_driver_classused for MySQL, in case you are using Oracle then use driver class started with java:: 
  • jdbc_connecting_string: provide information about database connection name. Here testdb is my database name.
  • jdbc_user: provide JDBC username .
  • jdbc_password: provide JDBC user’s password, in case you don’t have then you can remove this line.
  • schedule: used for the schedule this script to run after an interval. As I used “******”, it means it will run every second automatically. 
  • statement: Provide which type of statement data you want to ingest into Elasticsearch index. I am ingesting all the data from the user table. 
  • tracking_column_type: timestamp. This is a correct way to index data in case your MySQL data is getting update time to time, so MySQL will send updated column data to elasticsearch index. You can ignore in case your data is not going to update.

Step 3: Provide information about Output Section

In the output section of a logstash config file, we have to provide information about output as I am ingesting data into Elasticseach so I have to provide Elasticseach information.


output {
elasticsearch {
hosts => "localhost:9200"
index => "testdb"
document_type => "user"
document_id => "%{id}"
user => "elk-1"
password => "elk-1"
}
}

Details information about this output section

  • hosts: here we have to provide elasticsearch installed server IP address. As I am running in localhost to I provide localhost:9200
  • index: provide index name. Here testdb is my index name
  • document_type: Provide document type. Here user is my document_type
  • document_id: Here I assign my primary column key id as document id to make sure it will be easy to identify the data. This always comes as the best suggestion. 
  • user: provide elasticsearch username. In case if you don’t have then simply remove this line.
  • password: provide elasticsearch password. In case if you don’t have then simply remove this line.

Step 4: Run Logstash config file

Now your config file is ready as well as your all services are running. So simply run logstash config file from the logstash directory. 

  • In case of Windows OS, run like below
bin\logstash -f logstash-mysql.conf
  • In case of Linux OS, run like below
logstash -f logstash-mysql.conf

And yuppie, your MySQL data start to ingest into Elasticsearch based on your condition.


Complete logstash-mysql.conf file code


input {
jdbc {
jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connecting_string => "jdbc:mysql://localhost:3306/testdb"
jdbc_user => "root"
jdbc_password => "nandan"
statement => "SELECT * from users where row_update_time>:sql_last_value"
tracking_column_type => "timestamp"
schedule => "* * * * *"
}
}

output {
elasticsearch {
hosts => "localhost:9200"
index => "testdb"
document_type => "user"
document_id => "%{id}"
user => "elk-1"
password => "elk-1"
}
stdout {
codec => rubydebug
}
}


Conclusion

I hope, you learn about migrate data from RDBMS to Elasticsearch using Logstash. In case of any confusion, you can comment down and I will try my best to reply as soon as possible.

For other tutorials, related to Elasticsearch, Logstash or ELK Stack and other big data technology, please visit my website 

Share.

About Author

Comments are closed.