Author: Sai Kumar

GREP command in Unix/Linux with examples

grep – Global Regular Expression Parser It is used to search data in one/more files. Command 1: Search pattern in file - grep hello file.txt - grep sai file.txt file2.txt Command 2: Search pattern in current folder with all txt extensions. grep 1000 *.txt Command 3: Search data in all files in current folder grep 1000 *.* Command 4: Search ignoring case[-i] grep "Sai" file.txt (case sensitiv...

SED command in Unix/Linux with examples

SED – Stream Editor Used to display & editing data Editing options are – Insertion/Updation/Deletion 2 Types of Operations ——————— – Lines Addressing – Context Addressing Line Addressing- Command 1: Display line multiple times sed '2p' file.txt sed -n '3p' file.txt (specific line => -n) sed -n '5p' file.txt Command 2: Display...

All about AWK command in Unix – Part 1

AWK – select column data -Search data in file and print data on console -Find data of specific columns -Format output data -Used on file with bulk of data for searching, conditional executions, updating, filtering Command 1 ——— Print specific columns awk '{print $1}' file.txt by default TAB seperator awk '{print $1 "--" $2}' file.txt Command 2 – ———&...

All about AWK command in Unix – Part 2

Command 11 – ———- Find text at the start of line [ ^ ] awk -F "|" '$2-/^s/{print $0}' tabfile.txt Command 12 – ———- Find text at the ent of line [ $ ] awk -F "|" '$2 -/n$/{print $0}' file1.txt Command 13 – ———- perform condition check using if awk -F "|" '{if ($3>2000) print $0;}' file2.txt Command 14 – ——...

How to write Current method name to log in Scala[Code Snippet]

You will be having many methods in your application framework, and if want to trace and log current method name then the below code will be helpful for you. def getCurrentMethodName:String = Thread.currentThread.getStackTrace()(2).getMethodName def test{ println("you are in - "+getCurrentMethodName) println("this is doing some functionality") } test Output: you are in – test this is doing so...

How to Calculate total time taken for particular method in Spark[Code Snippet]

In some cases where you applied Joins in the spark application, you might want to know the time taken to complete the particular join. Below code snippet might come in handy to achieve so. import java.util.Date val curent = new Date().getTime println(curent) Thread.sleep(30000) val end = new Date().getTime println(end) println("time taken "+(end-curent).toFloat/60000 + "mins") Output: import java....

How to write current date timestamp to log file in Scala[Code Snippet]

Scala doesn’t have its own library for Dates and timestamps, so we need to depend on Java libraries. Here is the quick method to get current datetimestamp and format it as per your required format. Please note that all the code syntaxes are in Scala, this can be used while writing Scala application. import java.sql.Timestamp def getCurrentdateTimeStamp: Timestamp ={ val today:java.util.Date ...

Common issues with Apache Spark

Tricky Deployment: Once you’re done writing your app, you have to deploy it right? That’s where things get a little out of hand. Although there are many options for deploying your Spark app, the simplest and straightforward approach is standalone deployment. Spark supports Mesos and Yarn, so if you’re not familiar with one of those it can become quite difficult to understand what’s going on. You m...

Comparison between Apache Spark and Apache Hadoop

Spark Hadoop Comparison: The below the comparison between spark and Hadoop. They do different: Hadoop and Apache Spark are both big-data frameworks, but they don’t really serve the same purposes. Hadoop is essentially a distributed data infrastructure: It distributes massive data collections across multiple nodes within a cluster of commodity servers, which means you don’t need to buy ...

Version wise features of Apache Spark

Spark Release 2.1.0: Apache Spark 2.1.0 release makes significant strides in the production readiness of Structured Streaming, with added support for event time watermarks and Kafka 0.10 support. In addition, this release focuses more on usability, stability, and polish, resolving over 1200 tickets.  The below is the list of high level changes Core and Spark SQL: This version supports from json an...

All about Spark DataSet API

Dataset API The Dataset API, released as an API preview in Spark 1.6, aims to provide the best of both worlds; the familiar object-oriented programming style and compile-time type-safety of the RDD API but with the performance benefits of the Catalyst query optimizer. Datasets also use the same efficient off-heap storage mechanism as the DataFrame API. When it comes to serializing data, the Datase...

Advantages and Downsides of Spark DataFrame API

DataFrame API Spark 1.3 introduced a new DataFrame API as part of the Project Tungsten initiative which seeks to improve the performance and scalability of Spark. The DataFrame API introduces the concept of a schema to describe the data, allowing Spark to manage the schema and only pass data between nodes, in a much more efficient way than using Java serialization. There are also advantages when p...

Lost Password

Register

24 Tutorials