Site Sponsors:
Setting-up Hadoop 2 on Linux in 3 Easy Steps 
When the time came to pick a 'distro for use by our Hadoop students, because the Hortonworks VM was using CentOS (love it), just to round the student experience out a bit I decided to use Ubuntu.

Hadoop 2.2.0 is still about Java. If you are thinking production, be sure to use Oracle Java 1.6.

STEP 01: Source Install (OPTIONAL)

Obviously tracking to an LTS version, when you want to set-up the source code for Hadoop for spelunking on Linux (first-timers will want to avoid this step!), then you will want to do the following:
sudo bash
apt-get install maven
apt-get install git
apt-get update
apt-get upgrade
mkdir /hadoop9000
cd /hadoop9000
git clone git://
cd hadoop-common
mvn install -DskipTests
mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true
chown guest /hadoop9000 (or whoever)

The above will create the latest Hadoop on your machine. -To keep your production work moving along on the Hadoop 2 LCD, then also consider using the official binary install.

DEVELOPER NOTE: If you are looking to use your install for software development, note that the above step is not optional. Why? Because native mode libraries (as well as the rest of the lot) need to be generated for your R&D Platform. Depending upon the version of Linux you have, you may also need to install projects like ptotocol buffers (etc.) to compile Hadoop's C/C++ JNI underpinnings. Once created, just chum the lot of the 50-or-so libraries into /hadoop9000/lib. Why? Because you will want to use those JARs in your IDE (eclipse, netbeans, etc.) from a single standard location.

STEP 02: Official Binary Install (REQUIRED)

If rebuilding from the source is not what you want to do, then you can simply download & unzip the hadoop tar under /hadoop9000.

Note that if using gzip compressing is on the radar, then we will need to be sure to provide the proper 32 / 64 rendition of Hadoop's native libraries, as well. (Step 01 can build those native libraries for us, too. Use: mvn compile -Pnative )

STEP 03: Hadoop Environment Variables (REQUIRED)

Next, those Hadoop environment variables need to be wired-into your .bashrc: (The embolden ones are required - the rest are optional)
export JAVA_HOME="/usr/lib/jvm/java-6-oracle"
export HADOOP_HOME="/hadoop9000"

export HIVE_INSTALL="$HADOOP_HOME/hive/hive-0.12.0-bin"
# Can also place into
#export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/"
Yes, Oracle Java (/usr/lib/jvm/java-6-oracle) is officially endorsed as the the best real-world way to go on Hadoop.

Chasing the Moths

When debugging, do not forget to edit the

This should get you started. If you need to learn more, then consider signing-up for our next week of virtual training on Hadoop.




Add Comment
Comments are not available for this entry.