spark dataframe map example Archives

How to filter DataFrame based on keys in Scala List using Spark UDF

Spark

How to filter DataFrame based on keys in Scala List using Spark UDF [Code Snippets]

Sai Kumar March 7, 2018 No Comments

There are some situations where you are required to Filter the Spark DataFrame based on the keys which are already available in Scala collection. Let’s see how we can achieve this in Spark. You need to use spark UDF for this – Step -1: Create a DataFrame using parallelize method by taking sample data. scala> val df = sc.parallelize(Seq((2,"a"),(3,"b"),(5,"c"))).toDF("id","name") df: org.apache.spark.sql.DataFrame = [id: int, name: string] Step -2: Create a UDF which concatenates columns inside dataframe. Below UDF accepts a collection of columns and returns concatenated column separated by the given delimiter. scala> val concatKey = udf( (xs: Seq[Any], sep:String) => xs.filter(_ != null).mkString(sep)) concatKey: org.apache.spark.sql.UserDefinedFunction = UserDefinedFu...

How to filter DataFrame based on keys in Scala List using Spark UDF [Code Snippets]

Login

Lost Password

Register