What is input spilt size for 64mb block ,min IS is 32mb and max IS is 128mb.
64kb
67mb
127mb
How to Change replication factor in hdfs
How to get size of each file in hdfs path user/hdfs
What is default partitioner and combiner
Output for inner join and left ,right,full
Customer
1 A
2 B
2 B
4 C
5 D
Transaction
8 200
2 100
2 100
9 200
6 200
How to compare two files
How to get IDs of all jobs
How to run the process in background and how to bring it foreground and kill the job
How get 50th line in the text
Copy the line from file Lin by Lin which are greater than 10 lines and write it to sales.txt
@$
How to exclude two tables and import all tables,
How to skip special characters /n /r which are in rbms table and import into hdfs
Can sparkstreaming stopped without stopping spark context?
Zlib is the file format or not?
Hdfs- data locality,rack awareness , speculative execution, name node data node, replication factor
How to remove / add nodes
How to increase/decrease replication factor
What happens to data if we reduce replication factor
How to upload files to one server to other server
Without FTP client
Syntax
Command
Scenario in hive
Name salary age
How do u decide partition of this table
Why
On what basis u decide partition
What r the optimization techniques used in hive
When bucketing used
On what basis
Spark SQL basics
Why dataframes
What is difference btn hive and sparksql
Write program using sparksql
1)Project flow
2)My role
3)Hive Udf’s
4)Pig basics
5)Sqoop Incremental load
6)handle column data with quoted text .
Who decides data partitions in Spark
How they split
How shell script run on data
If Spark runs in memory , even mapreduce also runs on memory. How u can tell Spark is faster with inmemry processing
What is ur daily activities
Write Spark program
How Spark program works on data and data will be processed
Spark architecture
Mapreduce architecture
Project configuration
Ram rom
Indepth
How Unix scripts know where to launch
How jar program runs using the script
What u currently working on
U can easily crack it dada
Difference between mapreduce processing and Spark data processing
Sqoop vs flume
Hive serde
Pig basics
Mapreduce sorting and shuffling
Partitioning and bucketing.
In employee table, if we have deptid partition, and location as buckets
How do we take care this scenario
Explain bucketing.
Dynamic partitioning. Explain.
When u use pig and when u use hive.
Data locality in mapreduce
Hbase components
Modes in hbase
Scenario based on hive
Json ki parquet difference
How content will be in input filea
Ex :json, parquet
Write a program to pick max length word from 1tb of file
What is difference btn hive , hive with tez, spark SQL
How sqoop uses mapreduce
Write a program with extra column where I need sal greater than each emplid in the form of array in the new column -. In hive
How flapmap works
Word count in mapreduce
Detail explanation
How hive uses mapreduce
Partitioning in spark
Job flow in spark
Stages explanation in spark
Accumulator and broadcast variable detailed explanation
Cache
Distributed cache in mapreduce
Can accumulator called from executor
Can broadcast variable called from executor
What is the property of file to be considered when u store it into hdfs
How mapreduce takes the file
What properties are required for a table to import using sqoop
If there is no primary key how sqoop import the data
How many instances u have?
Where u do development?
How u migrate to prod?
How to verify whether ur results are correct?
Sanity check?
Bedrock version??
Interview questions:
Musigma
Performance optimization for hive queries ?
Can we decide the splitting the i/p files ?
Sqoop commands ?
Rdbms v/s hadoop?
Row level db v/s columnar db?
How to processes the web logs ?
Can we decide no of processes for running a job ?
Infosys
Partition,bucketting, if we have huge data after partitioning also,what we need do??
What type of input file formats we are getting,
Features of RDD ,Spark SQL,
How to change details in a file
How to display file names abc1 ,abc2,abc3,abc4 among abc1 to abc100
How to check high consuming processes in linux
Difference between spark and MR
IBM
IBM INTERVIEW QUE held on 18-02-2017
Hive varaible
Object inspector
Cosildation in hive
Mapreduce n YARN difference
Spark n mapreduce difference
RDD n data frames in spark
Sqoop import
Hive views
Hive external n managed
HBase n Hive differences
Orderby,sortby n clustered by
Speculative execution
Alter column command in hive
IBM interview questions on 01-03-2017
What is lazy evaluation in pig?
What is dynamic partition and static partition in hive?
What is the use of partitions and bucketing in hive?
Explain the flow of mapreduce program?
What is default partition in mapreduce and how can we override it?
What is difference between key class and value class in mapreduce?
What is the level of sub queries in hive?
What is transformation and action in spark?
Wipro
How hive is reading from spark
Executors,cores
How we are allocating memory
RDd,Df
Partitions on df ??
Who decides data partitions in Spark
How they split
How shell script run on data
What is ur daily activities
If Spark runs in memory , even mapreduce also runs on memory. How u can tell Spark is faster with inmemry processing
Write Spark program
How Spark program works on data and data will be processed
Spark architecture
Mapreduce architecture
Project configuration
Ram rom
Indepth
How Unix scripts know where to launch
How jar program runs using the script
What you currently working on
Infosys
Partition,bucketting, if we have huge data after partitioning also,what we need do??
What type of input file formats we are getting,
Features of RDD ,Spark SQL,
How to change details in a file
How to display file names abc1 ,abc2,abc3,abc4 among abc1 to abc100
How to check high consuming processes in linux
Difference between spark and MR
[4/13, 12:18 PM] Sai: Sqoop vs flume
Hive serde
Pig basics
Mapreduce sorting and shuffling
Partitioning and bucketing.
In employee table, if we have deptid partition, and location as buckets
How do we take care this scenario
Explain bucketing.
Dynamic partitioning. Explain.
When u use pig and when u use hive.
How u do sqoop incremental loading
Write commands in sqoop
Hive serde
Static partition and dynamic partitoning
Serde is the interface that tells hive how to read/write the data
Hive bucketing
Then write commands in Unix
So that we can pass parameters
Shell scripting
Command to sort data
In Unix
Based on datetime
Some basic commands
Scheduling jobs
In unix
Write program in spark
That can run hive queries
Pig commands
To load data
Write the script
Then they asked about sql
All joins
Structure of SQL query
Group by
Having clause
Order by clause
How to debug script in unix
Which is the command to debug without echo
Serde, write with example
Merge in SQL… Similarly in hive
Spark architecture
Group by query executed by Hive… What which task will execute by mapper and reducer
Json ki parquet difference
How content will be in input filea
Ex :json, parquet
How to truncate the data and load the data to table
Cust2 properties
Complex data types
UDF in hive
Convert rows into columns or columns into rows in hive
: Download hdp virtual box
: It’s 2gb
: 1. Volume of the data
2. Domain of the data—-
3. Brief idea of the data(few Meta Data)—
4. are schema and relations are defined in data
5. How good big data in relationships schema and values
6. Customer Transaction data which is financial Transactions from multiple channels( bank statements, cc statements mapped to unique customers from all the demography) we have some Roles for fraudulent transaction. set of vendors potentially fraud. I want to mark the similar transactions then should be marked fraud. Customer details, bankers statement, cc transaction, debt card. Design solution to detect fraud. How data is fit in Big data. Which frame work used
7. Frame work used ( have jar file) how will you do this. where the data is stored , what is the relationship how do u hold the relationship. do u use map reduce , pig or hive to process. what do u use for process.
8. transfer the data from local to hdfs( stream line moment where all the components fit in, what are the components used in each step like ETL, Storage, process and reporting, Data evaluation)
9. If data is in hive or HDFS what is the difference in data
10. Difference between sqoop n flume
I have a transaction table in hive, I want to get transaction which is fraud—for the same customer ID, amount, recipient name are same and all the transaction id which is fraud Number of transactions made by each customers
11. Benefit of creating a view
12. What all operations can I perform on view(create, delete)
13. How view stored in relational database
14. what are the problems and overcome
15. project architecture
16. How are roles distributed among your team
17. Project duration
18. Performance Tuning—HIVE,PIG,HBASE
Lineage
Dag scheduler
Groupbykey
Reducebykey
Tail recursion
Recursion
Dag
Block
Combine bykey
Sqoop incremental
Mappartion
Reducebykey v/s groupbykey
Map vs flatmap
SQL
Dataframe
Execution plan
In sql
Query optimization
Online lo ki vachaka ping chaei
: Lateral view
: File formats
: Combine by key
: Explain step by step wordcount program
: Explain step by step in wordcount program
: Partitiong and bucketting
: Transformation and actions
: Map vs flatmap
: Sqlcontext and hivecontext
: Write hive code for creating table and insert data in to it
Technique behind dividing data into partitions
Difference btn partition and partitionee
Partitioner
How to save data into hive from spark rdd
If there is reducebykey why again groupbykey
Wat are performance optimization techniques in hive
How do u do data modelling in hive
How to load parquet files into hive
How do u decide the number of buckets in hive bucketing
How to change value in hive table
How to change Dataframe to rdd
How to change MySQL job to Oracle job in sqoop
Optimization techniques u write in spark program
Spark modules
How to convert rdd to Dataframe
If we save as .Scala how it detects spark libraries
Write a program to pick max length word from 1tb of file
What is difference btn hive , hive with tez, spark SQL
How sqoop uses mapreduce
Write a program with extra column where I need sal greater than each emplid in the form of array in the new column -. In hive
How flapmap works
Word count in mapreduce
Detail explanation
How hive uses mapreduce
Partitioning in spark
Job flow in spark
Stages explanation in spark
Accumulator and broadcast variable detailed explanation
Cache
Distributed cache in mapreduce
Can accumulator called from executor
Can broadcast variable called from executor
What is the property of file to be considered when u store it into hdfs
How mapreduce takes the file
What properties are required for a table to import using sqoop
If there is no primary key how sqoop import the data