How to flatten whole JSON containing ArrayType and StructType in it? In order to flatten a JSON completely we don’t have any predefined function in Spark. We ca…Read More
There is no direct library to create Dataframe on HBase table like how we read Hive table with Spark sql. This post gives the way to create dataframe on top of …Read More
You may required to add Serial number to Spark Dataframe sometimes. It can be done with the spark function called monotonically_increasing_id(). It generates a …Read More
How Spark Jobs are Executed- A Spark application is a set of processes running on a cluster. All these processes are coordinated by the driver program. The driv…Read More
In some cases where you applied Joins in the spark application, you might want to know the time taken to complete the particular join. Below code snippet might …Read More
scala> val inputDF = sc.parallelize(Seq((1,”oclay”,400,”2015-01-01 00:00:00″),(1,”oclay”,800,”2018-01-01 00:00:00″))).toDF(“pid”,”pname”,”price”,”last_mod”) …Read More
Transformations and Actions – Spark defines transformations and actions on RDDs. Transformations – Return new RDDs as results. They are lazy, Their …Read More
Ways to create DataFrame in Apache Spark – DATAFRAME is the representation of a matrix but we can have columns of different datatypes or similar table wit…Read More