python

Find the average of all contiguous subarrays of fixed size in it

Given an array, find the average of all contiguous subarrays of size ‘n’ in it. Array: [1, 3, 2, 6, -1, 4, 1, 8, 2], n=5 Output: [2.2, 2.8, 2.4, 3.6, 2.8] Solution: Sliding Window algorithm can be used to resolve this. Time Complexity: O(n) Space Complexity: O(1)

Find a pair in the array whose sum is equal to the given target

Given an array of sorted numbers and a target sum, find a pair in the array whose sum is equal to the given target. Write a function to return the indices of the two numbers (i.e. the pair) such that they add up to the given target. Example 1: Input: [1, 2, 3, 4, 6], target=6 Output: [1, 3] Explanation: The numbers at index 1 and 3 add up to 6: 2+4=6 We can use the Two Pointers approach to solve this. Solution: Time Complexity: O(n) Space Complexity: O(1)

How to Retrieve Password from JCEKS file in Spark

In the data ingestion stage into Hadoop from RBDMS sources, it often requires password to hit source tables in RDBMS databases. Passing hard password directly is highly unsafe and bad practice in real time applications. So, password can be encrypted by creating JCEKS file. JCEKS is basically a keystore file saved in the Java Cryptography Extension KeyStore (JCEKS) format; used as an alternative keystore to the Java Keystore (JKS) format for the Java platform; stores encoded keys. When working on Spark application which deals with RDBMS sources JCEKS need to be decrypted to query the source tables. Below is the handy function to retrieve password from JCEKS file- Using PySpark Using Scala

How to Add Serial Number to Spark Dataframe

You may required to add Serial number to Spark Dataframe sometimes. It can be done with the spark function called monotonically_increasing_id(). It generates a new column with unique 64-bit monotonic index for each row. But it isn’t significant, as the sequence changes based on the partition. In short,  random numbers will be assigned which are out of sequence. If the goal is add serial number to the dataframe, you can use zipWithIndex method available on RDD. below is how you can achieve the same on dataframe. [code lang=”python”] from pyspark.sql.types import LongType, StructField, StructType def dfZipWithIndex (df, offset=1, colName="rowId"): ”’ Enumerates dataframe rows is native order, like rdd.ZipWithIndex(), but on a dataframe and preserves a ...

How to Use Python For Loop ?

In this article, you’ll learn how to use Python for loop (Range Collection, String, Collections)? Using Python For Loop on range collection: Using Python For Loop in String: Using Python For Loop on Collections :   For any queries or doubts Ask Questions in 24Turorials Forum.

How to generate DDL(create statement) with columns using Python[code snippets]

Data loading is the initial step in Big Data Analytics world, you are supposed to push all the data to Hadoop first and then you can start working on analytics. When loading data to Hadoop environment, in some cases you will be getting data in the form of flat files. Once the data is loaded, if you want to view data or query this data we need to create HIVE table on top of that data. So it is obvious to create DDL if you want to create hive table. In real time, you have to check the file get the column names and then you have to create DDL manually. This tutorial helps you to get rid of manual work and you can create DDLs dynamically in a single click with Python. Let’s say we have the incoming data file as shown below – Name|ID|ContactInfo|Date_emp Michael|100|547-968-091|2014...

Lost Password

Register

24 Tutorials