Spark

How to Create an Spark RDD?

how-to-create-rdd-in-spark-24tutorials.jpg

RDDs can be created in two ways:
1)Transforming an existing RDD.
2)From a SparkContext or SparkSession object.

– Transforming an existing RDD:
When map called on List, it returns a new List. Similarly, many higher-order functions defined on RDD returns a new RDD.

– From a SparkContext (or SparkSession) object:
The SparkContext object (renamed SparkSession) can be thought of as your handle to the Spark cluster. It represents the connection between the Spark cluster and your running application. It defines a handful of methods which can be used to create and populate a new RDD:

a)parallelize: convert a local Scala collection to an RDD.
ex:- val rdd= sc.parallelize(Seq("1","2","3"))
b)textFile: read a text file from HDFS or a local file system and return an RDD of String.
ex:-val rdd= sc.textfile("/users/guest/read.txt")

Share This Post

An Ambivert, music lover, enthusiast, artist, designer, coder, gamer, content writer. He is Professional Software Developer with hands-on experience in Spark, Kafka, Scala, Python, Hadoop, Hive, Sqoop, Pig, php, html,css. Know more about him at www.24tutorials.com/sai

Lost Password

Register

24 Tutorials