Hadoop Interview questions

Total available count: 27
Subject - Apache
Subsubject - Hadoop

Why Spark, even Hadoop exists?

The below cases are references,

Iterative Algorithm: Generally MapReduce is not good to process iterative algorithms like Graph processing and Machine Learning. Graph and Machine Learning algorithms are iterative by nature and less saves to disk, this type of algorithm requires data in memory to run algorithm steps again and again or fewer transfers over the network mean better performance.

In-Memory Processing: MapReduce uses disk storage for storing processes. Intermediate data and also read from disks which are not fine for fast processing. Because Spark keeps data in Memory (Configurable), which saves a lot of time, by not reading and writing data to disk as it happens in the case of Hadoop.

Near real-time data processing: Spark also supports near real-time streaming workloads through the Spark Streaming application framework.