1.What is the version of spark you are using? Check the spark version you are using before going to Interview. As per 2020, the latest version of spark is 2.4.x 2.Difference between RDD, Dataframe, Dataset? RDD – RDD is Resilient Distributed Dataset. It is the fundamental data structure of Spark and is immutable collection of records partitioned across nodes of cluster. It allows us to perform in-memory computations on large clusters in a fault-tolerant manner. Compared with DF and DS, RDD will not hold the schema. It holds only the data. If user want to implement schema over the RDD, User have to create a case class and have to implement the schema over the data. We will use RDD for the below cases: -When our data is unstructured, A streams of text or media streams. -When we donR...
Q1) CASE Classes: A case class is a class that may be used with the match/case statement. Case classes can be pattern matched Case classes automatically define hashcode and equals Case classes automatically define getter methods for the constructor arguments. Case classes can be seen as plain and immutable data-holding objects that should exclusively depend on their constructor arguments. Case classes contain a companion object which holds the apply method. This fact makes possible to instantiate a case class without the new keyword. Q2) Pattern Matching Scala has a built-in general pattern matching mechanism. It allows to match on any sort of data with a first-match policy object MatchTest1 extends App { def matchTest(x: Int): String = x match { case 1 => “one” case 2 =>...