Author: Suriya

How to create Spark DataFrame from different sources

Creating DataFrame from Scala List or Sequence In some cases in order to test our business logic we need to have DataFrame and in most cases we would have created DataFrame from a sample file. Instead of doing that we can create a List of our sample data and we can convert it to DataFrame. Note : spark.implicits._ will be available in spark-shell by default. In case if we want to test in IDE we sh...

Reusable Spark Scala application to export files from HDFS/S3 into Mongo Collection

Application Flow How This Application Works ? When user invokes the application using spark-submit First, the application will parse and validate the input options. Instantiate new SparkSession with mongo config spark.mongodb.output.uri. Depending on the input options provided by the user DataFrame will be created for source data file. If user provided a transformation SQL a temporary view will be...

Scala Companion Object Explained

What is a companion object? Companion object is nothing but a singleton object which will have same name as the class name and it will hold all the static methods and members. Since, Scala doesn’t support definition of static methods in the class we have to create the companion object. Both the class and companion object should need to have same name and should be present in the same source ...

How to flatten JSON in Spark Dataframe

How to flatten whole JSON containing ArrayType and StructType in it? In order to flatten a JSON completely we don’t have any predefined function in Spark. We can write our own function that will flatten out JSON completely. We will write a function that will accept DataFrame. For each field in the DataFrame we will get the DataType. If the field is of ArrayType we will create new column with explo...

Lost Password

Register

24 Tutorials