Q11. What Mapper does ?
Ans: Maps are the
individual tasks that transform
input records into intermediate records. The transformed intermediate records do not need to be of the same type as the input records. A given input pair may map to zero or many output pairs.
Q12. What is the InputSplit in map reduce software?
Ans: An InputSplit is a logical representation of a unit (A chunk) of input work for a map task; e.g., a filename
and a byte range within that file to process
or a row set in a text file.
Q13. What is the InputFormat ?
Ans: The InputFormat is responsible for enumerate (itemise)
the InputSplits, and producing
a RecordReader which will turn those logical
work units into actual physical input records.
Q14. Where do you specify the Mapper Implementation?
Ans: Generally mapper implementation is specified
the Job itself.
Q15. How Mapper is instantiated in a running job?
Ans: The Mapper itself
is instantiated in the running job, and will be passed a MapContext
object which it can use to configure
Q16. Which are the methods
in the Mapper interface?
Ans : The Mapper contains
the run() method, which call its own setup() method only once, it also call a map() method for each input and finally
calls it cleanup() method. All above methods
can override in your code.
What happens if you don’t override the Mapper methods and keep them as it is?
Ans: If you do not override any methods (leaving even map as-is), it will act as the identity function,
input record as a separate output.
Q18. What is the use of Context object?
Ans: The Context object allows the
mapper to interact
with the rest of the Hadoop system. It Includes configuration data for the job, as well as interfaces which allow it to emit output.
Q19. How can you add
the arbitrary key-value pairs in
Ans: You can set arbitrary (key, value) pairs of configuration data in your Job, e.g. with Job.getConfiguration().set("myKey", "myVal"), and then retrieve this data in your mapper with
Context.getConfiguration().get("myKey"). This kind of functionality is typically done in the Mapper's setup() method.
Q20. How does Mapper’s run() method works?
Ans: The Mapper.run() method then calls map(KeyInType, ValInType, Context)
for each key/value pair in the InputSplit
for that task