Site Sponsors:
Setting-up Hadoop 2 on Linux in 3 Easy Steps 
When the time came to pick a 'distro for use by our Hadoop students, because the Hortonworks VM was using CentOS (love it), just to round the student experience out a bit I decided to use Ubuntu.



Hadoop 2.2.0 is still about Java. If you are thinking production, be sure to use Oracle Java 1.6.

STEP 01: Source Install (OPTIONAL)


Obviously tracking to an LTS version, when you want to set-up the source code for Hadoop for spelunking on Linux (first-timers will want to avoid this step!), then you will want to do the following:
sudo bash
apt-get install maven
apt-get install git
apt-get update
apt-get upgrade
mkdir /hadoop9000
cd /hadoop9000
git clone git://git.apache.org/hadoop-common.git
cd hadoop-common
mvn install -DskipTests
mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true
chown guest /hadoop9000 (or whoever)
sync
exit

The above will create the latest Hadoop on your machine. -To keep your production work moving along on the Hadoop 2 LCD, then also consider using the official binary install.

DEVELOPER NOTE: If you are looking to use your install for software development, note that the above step is not optional. Why? Because native mode libraries (as well as the rest of the lot) need to be generated for your R&D Platform. Depending upon the version of Linux you have, you may also need to install projects like ptotocol buffers (etc.) to compile Hadoop's C/C++ JNI underpinnings. Once created, just chum the lot of the 50-or-so libraries into /hadoop9000/lib. Why? Because you will want to use those JARs in your IDE (eclipse, netbeans, etc.) from a single standard location.

STEP 02: Official Binary Install (REQUIRED)


If rebuilding from the source is not what you want to do, then you can simply download & unzip the hadoop tar under /hadoop9000.

Note that if using gzip compressing is on the radar, then we will need to be sure to provide the proper 32 / 64 rendition of Hadoop's native libraries, as well. (Step 01 can build those native libraries for us, too. Use: mvn compile -Pnative )

STEP 03: Hadoop Environment Variables (REQUIRED)


Next, those Hadoop environment variables need to be wired-into your .bashrc: (The embolden ones are required - the rest are optional)
export JAVA_HOME="/usr/lib/jvm/java-6-oracle"
export HADOOP_HOME="/hadoop9000"
export HADOOP_PATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin"

export PIG_INSTALL="$HADOOP_HOME/pig"
export HIVE_INSTALL="$HADOOP_HOME/hive/hive-0.12.0-bin"
export PATH="$PATH:$HADOOP_PATH:$PIG_INSTALL/bin:$HIVE_INSTALL/bin"
export HADOOP_COMMON_LIB_NATIVE_DIR="$HADOOP_HOME/lib/native/"
# Can also place into hadoop-env.sh
#export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/"
Yes, Oracle Java (/usr/lib/jvm/java-6-oracle) is officially endorsed as the the best real-world way to go on Hadoop.

Chasing the Moths


When debugging, do not forget to edit the hadoop-env.sh:
HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5123"

This should get you started. If you need to learn more, then consider signing-up for our next week of virtual training on Hadoop.

Enjoy,

-Rn




[ view entry ] ( 1287 views )   |  permalink  |  related link
Linux Disk Dump (35MB/Sec) 
Last week I noticed that my main drive was showing some classic boot-up warning signs. Rather than risk another 4 hour re-install session (followed by an 6 hour content restoration), we decided to get a 2T Toshiba backup drive.

It arrive skewed. After checking around, lots of my friends at the local computer repair places were saying that the Toshiba Drives have even worse failure rates than Western Digital. (So much for saving $20!)

So it was back to the Internet to get yet another 2T Seagate Barracuda: Tried and true, the 7200 RPM drive is well worth an extra $20.oo (lol.) -Better still, by matching the existing drive our confidence factor was absolute for an anticipated manual swap-out.

DeeDee


After well under 20 hours (was out Christmas shopping when the `dd` completed), the note to self is to use the POSIX `time` command next time around.

More officially however, here are the stats for the disk-dumping of 2TBs worth of data between two "identical" hard drives:

dd if=/dev/sda of=/dev/sdb
3907029168+0 records in
3907029168+0 records out
2000398934016 bytes (2.0 TB) copied, 57876.7 s, 34.6 MB/s

(Yea, sounds a bit off to me, too. -But we were running both live & hot.)

Regardless, the next time I Dee-Dee this locus, we might update this entry with the actual `time` (man time) results. Yet for now there is enough info in the above (round to 35MB/Sec - we were not that hot) for others to do the math to support their own `dd` ops.

Cheers!

-Rn


[ view entry ] ( 1653 views )   |  permalink  |  related link
F-Word: Our Greatest Happiness? 
Linked-in: HELP! I Got Blamed for my Co-Worker's Mistake

Life is a long song. I like to remember that success is not Final; Failure, seldom Fatal.

Through it all, I try to remember that peace is only obtainable by remembering another F-word. (No, not THAT one ...;)

The happiest folks we will ever meet have always learned how to both grant, as well as to get "Forgiveness" :)

[ view entry ] ( 1298 views )   |  permalink  |  related link

<<First <Back | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | Next> Last>>