Q21. Which object can be used to get the progress of a particular Jon?
Q22. What is next step after Mapper
Ans : The output of the Mapper
are sorted and Partitions will be created
for the output.
Number of partition depends on the number of reducer.
Q23. How can we control particular key should
go in a specific reducer?
Ans: Users can control which keys (and hence records) go to which Reducer by implementing a custom
Q24. What is the use of Combiner?
optional component or
and can be specify
via Job.setCombinerClass(ClassName), to
aggregation of the intermediate outputs, which helps to cut down the amount of data transferred from the Mapper
to the Reducer.
Q25. How many maps are
there in a
Ans: The number of maps is usually driven by the total size of the inputs, that is, the total number
of blocks of
the input files. Generally
it is around
10-100 maps per-node.
Task setup takes awhile, so it is best
if the maps take at
least a minute to execute. Suppose, if you expect 10TB of input data and have a blocksize of 128MB, you'll end up with 82,000
control the number
of block you can use the mapreduce.job.maps parameter (which
only provides a
hint to the framework). Ultimately, the number of tasks is controlled by the number of splits returned
by the InputFormat.getSplits() method
(which you can override).
Q26. What is the
Reducer used for?
Ans: Reducer reduces a set of intermediate values which share a key to a (usually smaller) set of values. The number of reduces for the job is set
Explain the core methods of the
Ans: The API of Reducer is very similar
to that of Mapper,
there's a run() method that
receives a Context containing the job's configuration as well as interfacing methods that return data from the reducer itself back to the framework. The run() method
calls setup() once, reduce()
once for each key associated with the reduce task, and cleanup()
once at the end. Each of these methods
can access the
job's configuration data by using Context.getConfiguration().
As in Mapper, any or all of these methods can be overridden with custom implementations. If none of these methods
are overridden, the default
reducer operation is the identity
function; values are passed through without
The heart of Reducer is its reduce() method. This is called
once per key; the second argument is an Iterable
all the values associated with that key.
Q28. What are the
of the Reducer?
Ans: Shuffle, Sort and Reduce
Q29. Explain the shuffle?
Ans: Input to the Reducer is the sorted
output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP.
Q30. Explain the Reducer’s Sort phase?
Ans: The framework
groups Reducer inputs by keys (since different
mappers may have output the same key) in this stage. The shuffle and sort phases occur simultaneously; while map-outputs are being
fetched they are merged (It
is similar to merge-sort).