Learn how to use Apache Hive on YARN to allow Hadoop to support more varied processing approaches and a broader array of applications.
YARN is a software rewrite that decouples MapReduce’s resource management and scheduling capabilities from the data processing component, enabling Hadoop to support more varied processing approaches and a broader array of applications. For example, Hadoop clusters can now run interactive querying and streaming data applications simultaneously with MapReduce batch jobs. In this blog, we will use Apache YARN on Apache Hive. Let’s get started!
Add the file yarn-site.xml inside your /usr/local/hadoop/etc/hadoop folder with the following content:
First, start the DFS with the following command:
Next, start the YARN Resource Manager with the command yarn resourcemanager start:
Then, start the YARN Node Manager with the command yarn nodemanager start:
Start your Hive CLI and fire an insert into the query since it is a MapReduce query:
Now, why does this job fail? There are two ways to see the application logs. One is by typing command yarn logs -applicationId:
…and the other is through navigating to job rankings specified by the YARN UI job tracking URL .
Now, even if you navigate to the YARN UI, the query is successful. That’s it! I hope that this blog is helpful for those starting with Apache Hive and YARN.