Nov 13, 2013 11:00 PM
Apache Hadoop has become the defacto platform for big data processing. It composes mainly of two platforms (1) Hadoop Distirbuted File System - a distributed file system running on a cluster of commodity machines and (2) MapReduce - a programming model to perform large scale processing of data residing on HDFS. In this talk, I'll introduce Hadoop and its main components, what is in Hadoop that is revolutionizing data processing. I'll quickly explain how users write applications on top of Hadoop. I'll round it up with introductions to various Hadoop eco-system projects like Pig, Hive, HBase, etc.
Secondly, Apache Hadoop MapReduce has undergone a revolution to emerge as Apache Hadoop YARN, a generic compute platform. This part of the talk will cover the details of Apache Hadoop YARN - architecture, applications and its real life usage. I'll conclude with how YARN is changing Hadoop to be a general purpose processing platform by enabling users to run batch, stream-processing, graph workloads, and more - all on same cluster resources.