Skip to content

Experiments: Running the MapReduce code (Linux)

Burra Abhishek edited this page Jun 4, 2021 · 10 revisions

The following experiments were conducted on Linux Mint 20 Cinnamon having 2 processors, 5.5 GB RAM and 70 GB storage. For each of the experiments, these commands are agnostic of the platform in which Hadoop was set up, unless mentioned otherwise. In addition to that, each of these experiments have the following assumptions:

  • The username is burraabhishek
  • The present working directory is ~/src. The entire src directory of this repository was cloned to the home directory in the Linux machine.
  • All the directories in the Hadoop Distributed File System differ across various development environments

These values differ across various development environments. Replace these values wherever necessary.

NOTE: To run these experiments, a Hadoop Development Environment is required. This guide can help you get started if you do not have a Hadoop Development Environment.

Starting Hadoop Distributed File System

Run each of these commands to start HDFS:

start-dfs.sh
start-yarn.sh

For these experiments, it is recommended to open the Terminal from the present working directory and then run the above commands. start-dfs.sh: Starts the Distributed File System. This starts the following:

  • namenode (on localhost, unless otherwise specified)
  • datanodes
  • secondary namenodes

start-yarn.sh: Starts Hadoop YARN (Yet Another Resource Manager). YARN manages computing resources in clusters. Running this command starts the following:

  • resourcemanager
  • nodemanagers

To check the status of the Hadoop daemons, type the command jps. jps is Java Virtual Machine Process Status Tool. For example:

$ jps
2487 DataNode
2632 SecondaryNameNode
3193 Jps
2397 NameNode
Clone this wiki locally