Hadoop Interview questions


Total available count: 27
Subject - Apache
Subsubject - Hadoop

What is DAGSchedular and how it performs?

DAGScheduler is the scheduling layer of Apache Spark that implements stage-oriented scheduling, i.e. post an RDD action has been called it becomes a job that is then transformed into a set of steps that are submitted as TaskSets for execution. 

DAGScheduler uses an event queue architecture in which a thread can post DAGSchedulerEvent events, e.g. a new/fresh job or stage being submitted, that DAGScheduler reads and executes sequentially.




Next 5 interview question(s)

1
Please define executors in detail?
2
Please explain, how workers work, when a new Job submitted to them?
3
What are the workers?
4
What is the purpose of Driver in Spark Architecture?
5
Define Spark architecture?