RDDs can be created in two ways:
1)Transforming an existing RDD.
2)From a SparkContext or SparkSession object.
– Transforming an existing RDD:
When map called on List, it returns a new List. Similarly, many higher-order functions defined on RDD returns a new RDD.
– From a SparkContext (or SparkSession) object:
The SparkContext object (renamed SparkSession) can be thought of as your handle to the Spark cluster. It represents the connection between the Spark cluster and your running application. It defines a handful of methods which can be used to create and populate a new RDD:
a)parallelize: convert a local Scala collection to an RDD.
ex:- val
rdd
=
sc.parallelize(
Seq("1","2","3")
)
b)textFile: read a text file from HDFS or a local file system and return an RDD of String.
ex:-val
rdd
=
sc.textfile(
"/users/guest/read.txt"
)