Hadoop MapReduce Interview Questions and Answers
- Explain the usage of Context Object.
Context Object is used to help the mapper interact with other Hadoop systems. Context Object can be used for updating counters, to report the progress and to provide any application level status updates. ContextObject has the configuration details for the job and also interfaces, that helps it to generating the output.
- What are the core methods of a Reducer?
The 3 core methods of a reducer are –
1)setup () – This method of the reducer is used for configuring various parameters like the input data size, distributed cache, heap size, etc.
Function Definition- public void setup (context)
2)reduce () it is heart of the reducer which is called once per key with the associated reduce task.
Function Definition -public void reduce (Key,Value,context)
3)cleanup () – This method is called only once at the end of reduce task for clearing all the temporary files.
Function Definition -public void cleanup (context)
- Explain about the partitioning, shuffle and sort phase
Shuffle Phase-Once the first map tasks are completed, the nodes continue to perform several other map tasks and also exchange the intermediate outputs with the reducers as required. This process of moving the intermediate outputs of map tasks to the reducer is referred to as Shuffling.
Sort Phase– Hadoop MapReduce automatically sorts the set of intermediate keys on a single node before they are given as input to the reducer.
Partitioning Phase-The process that determines which intermediate keys and value will be received by each reducer instance is referred to as partitioning. The destination partition is same for any key irrespective of the mapper instance that generated it.
- How to write a custom partitioner for a Hadoop MapReduce job?
Steps to write a Custom Partitioner for a Hadoop MapReduce Job-
- A new class must be created that extends the pre-defined Partitioner Class.
- getPartition method of the Partitioner class must be overridden.
- The custom partitioner to the job can be added as a config file in the wrapper which runs Hadoop MapReduce or the custom partitioner can be added to the job by using the set method of the partitioner class.
- What is the relationship between Job and Task in Hadoop?
A single job can be broken down into one or many tasks in Hadoop.
- Is it important for Hadoop MapReduce jobs to be written in Java?
It is not necessary to write Hadoop MapReduce jobs in java but users can write MapReduce jobs in any desired programming language like Ruby, Perl, Python, R, Awk, etc. through the Hadoop Streaming API.
- What is the process of changing the split size if there is limited storage space on Commodity Hardware?
If there is limited storage space on commodity hardware, the split size can be changed by implementing the “Custom Splitter”. The call to Custom Splitter can be made from the main method.
- What are the primary phases of a Reducer?
The 3 primary phases of a reducer are –
1)Shuffle
2)Sort
3)Reduce
- What is a TaskInstance?
The actual hadoop MapReduce jobs that run on each slave node are referred to as Task instances. Every task instance has its own JVM process. For every new task instance, a JVM process is spawned by default for a task.
- Can reducers communicate with each other?
Reducers always run in isolation and they can never communicate with each other as per the Hadoop MapReduce programming paradigm.
We have further categorized Hadoop MapReduce Interview Questions for Freshers and Experienced-
- Hadoop Interview Questions and Answers for Freshers – Q.Nos- 2,5,6
- Hadoop Interview Questions and Answers for Experienced – Q.Nos- 1,3,4,7,8,9,10