How to flatten whole JSON containing ArrayType and StructType in it? In order to flatten a JSON completely we don’t have any predefined function in Spark. We can write our own function that will flatten out JSON completely. We will write a function that will accept DataFrame. For each field in the DataFrame we will get the DataType. If the field is of ArrayType we will create new column with exploding the ArrayColumn using Spark explode_outer function. If the field is of StructType we will create new column with parentfield_childfield for each field in the StructType Field. This is a recursive function. Once the function doesn’t find any ArrayType or StructType. It will return the flattened DataFrame. Otherwise, It will it iterate through the schema to completely flatten out the JSON...
Ways to create DataFrame in Apache Spark – DATAFRAME is the representation of a matrix but we can have columns of different datatypes or similar table with different rows and having different types of columns (values of each column will be same data type). When working with Spark most of the times you are required to create Dataframe and play around with it. DATAFRAME is nothing but a data structure which is stored in memory and can be created by following ways – 1)Using Case Class 2)Using createDataFrame method 3)Using SQL method 4)Using read..load methods i) From flat files(JSON, CSV) ii) From RDBMS Databases 1)Using Case Class val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ case class Employee(name: String, sal: Int) Below is the sample...