Python

How to generate DDL(create statement) with columns using Python[code snippets]

How to generate DDL(create statement) with columns using Python[code snippets]

Data loading is the initial step in Big Data Analytics world, you are supposed to push all the data to Hadoop first and then you can start working on analytics. When loading data to Hadoop environment, in some cases you will be getting data in the form of flat files.

Once the data is loaded, if you want to view data or query this data we need to create HIVE table on top of that data. So it is obvious to create DDL if you want to create hive table. In real time, you have to check the file get the column names and then you have to create DDL manually. This tutorial helps you to get rid of manual work and you can create DDLs dynamically in a single click with Python.

Let’s say we have the incoming data file as shown below –

Name|ID|ContactInfo|Date_emp
Michael|100|547-968-091|2014-01-29
Will|101|527-948-090|2013-10-02
Steven|102|647-968-598|2012-11-03
Lucy|103|577-928-094|2010-01-03

The first line of this file will be the header or schema required to create table.

Below is the code to get the columns from first line of this file and generate DDL file with Python.


Usage of above script –

Save this script into a file with some name, here createDDL.py

python createDDL.py <input file name with path> <table name>

e.g., – python createDDL.py home/user/emp.txt EMPLOYEETABLE

This generates a hql file with tablename you passed.

Hope this helps!!

Share This Post

An Ambivert, music lover, enthusiast, artist, designer, coder, gamer, content writer. He is Professional Software Developer with hands-on experience in Spark, Kafka, Scala, Python, Hadoop, Hive, Sqoop, Pig, php, html,css. Know more about him at www.24tutorials.com/sai

Lost Password

Register

24 Tutorials