HADOOP AUTOMATION SINGLE NODE CLUSTER (PSEUDO DISTRIBUTED MODE) ON VIRTUAL MACHINE

Views (17)

MYGz

2,199

Like(0)

Report

The goal of this blog is to automate hadoop single node cluster installation and configuration on your Linux (ubuntu and its variants) machine. If thats what you want then read on.

The Hadoop version we will be using is 1.2.1 and the Operating System is Lubuntu(Light weight Ubuntu), you can use ubuntu or any of its flavours (https://wiki.ubuntu.com/UbuntuFlavors). The virtualisation software is VMPlayer.

Download Links for VMPlayer and Lubuntu:

VMPlayer : https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/6_0|PLAYER-605|product_downloads

Lubuntu 32-Bit: http://cdimage.ubuntu.com/lubuntu/releases/14.10/release/lubuntu-14.10-desktop-i386.iso

Lubuntu 64-Bit: http://cdimage.ubuntu.com/lubuntu/releases/14.10/release/lubuntu-14.10-desktop-amd64.iso

We will be using Bash Script for automation of the Hadoop single-node cluster.

Here is the link for the Bash Script you need to download:

Check your operating system and download accordingly. The only difference between the two scripts is the Java home.

Bash Script for 32-Bit OS: https://drive.google.com/file/d/0B2T8Pye0P7e5OFh6TjhJTi00WTA/view?usp=sharing

Bash Script for 64-Bit OS: https://drive.google.com/file/d/0B2T8Pye0P7e5MjExT1hwRjdPUkk/view?usp=sharing

After downloading the Bash script you need to edit two things in it:

1. Replace all the occurrences of “your-username” with the username with which you will be using hadoop.

2. Replace the one occurrence of “your-hostname” at line 25 with your current hostname.

Here is the video on how to do everything explained in this blog:

Please read the entire blog and then watch the video.

The Bash Script will be doing the following things:

1. Update apt-get

2. Download ssh server

3. Download Java version 7

4. Download Hadoop

5. Configure .bashrc

6. Configure Hadoop

7. Configure hostname : Changing hostname to nn. Editing /etc/hosts file and replacing your hostname with nn.

The new hostname will be nn. This is to minimize the no. of edits required. If you do not want nn as the new hostname then you will have to replace “nn” with the desired hostname everywhere in the script. If you fail to replace all of it then your cluster might not start.

8. Configure passwordless ssh login.

After downloading everything these are the steps to follow:

STEP 1: Creating a Bash Script File in Linux and making it executable

Create a text file named hadoopscript: vi hadoopscript

Copy paste the entire text from BashScript config. Save and exit.

Make it executable: chmod +x hadoopscript

STEP 2: Running the Bash Script.

Run the bash script: ./hadoopscript

Wait until the script finishes execution.

STEP3: Start using Hadoop Single-Node Cluster.

In this step we will fire up the hadoop daemons and start exploring Hadoop.

Hadoop Shell commands: http://hadoop.apache.org/docs/r0.18.3/hdfs_shell.html#cat

Google for use cases of single-node hadoop cluster.

If you get stuck anywhere in the tutorial then just drop a comment below. I will help you to resolve the issue.

I hope this blog was informative for you. And I would like to thank you for reading it.

-Mohammad Yusuf Ghazi

This was originally posted here.

Comments

*This post is locked for comments

Community site session details

HADOOP AUTOMATION SINGLE NODE CLUSTER (PSEUDO DISTRIBUTED MODE) ON VIRTUAL MACHINE

Comments

Responsible AI policies

Neeraj Kumar – Community Spotlight

Congratulations to the November Top 10 Community Leaders!