Site Sponsors:
Man -v- Pig 
Whenever something is slow, unwieldy & wallowing in muck, colloquial English has reserved an animal cat-call for us to use... Never as responsive as stand-alone Map Reduce (well, in production at least!) such is obviously the legacy for Hadoop Pig.



Yet while Pig is indeed getting faster with each and every release (Gotta love C/C++), sometimes piggy-job slowdowns can get downright scary. Indeed, those who use HUE and / or the Hortonworks VM via the Pig-Page will feel that we have encountered a new performance-low.

Not to worry though - When working from the VM Command line, those 26 & 48 second 3-line pig-scripts will get back to running in well under 2 - 3 seconds. (Really!)

The way is easy: Simply log-into your VM via the console prompt so as to execute Pig directly:
pig -x local

Of course, once running Pig, the massive amount of information and warning messages can also seem positively ludicrous.



Here again, in the Man -over -Pig struggle at Hadoop Acres, a local configuration file is all that we need:



Useful for both R&D as well impromptu Production spelunking, here be the content for log4j.prop:
log4j.logger.org.apache.pig=ERROR
log4j.logger.org.apache.hadoop=ERROR

The cut-and-paste fodder for starting up our quiet-pig pen is simply:
pig -x local -4 log4j.prop

You will still be able to get those all-important errors:



Finally, simply leave-off that -4 clause when you want all of that glorious detail back again.


Enjoy the Journey!

-Rn

[ add comment ] ( 1590 views )   |  permalink  |  related link
New Project: Hadoop9000 
For the benefit of folks who will either be taking our Hadoop training, or are interested in having a copy of the latest stable Hadoop (a copy that they can debug / tweak / re-build on-their-own), then we just created the "Hadoop 9000" Project.

Weighing in at 3.5GB, the VirtualBox Virtual Machine (OVA Appliance) will be uploaded sometime this week.

Cheers,

-Rn


[ add comment ] ( 1193 views )   |  permalink  |  related link
Setting-up Hadoop 2 on Linux in 3 Easy Steps 
When the time came to pick a 'distro for use by our Hadoop students, because the Hortonworks VM was using CentOS (love it), just to round the student experience out a bit I decided to use Ubuntu.



Hadoop 2.2.0 is still about Java. If you are thinking production, be sure to use Oracle Java 1.6.

STEP 01: Source Install (OPTIONAL)


Obviously tracking to an LTS version, when you want to set-up the source code for Hadoop for spelunking on Linux (first-timers will want to avoid this step!), then you will want to do the following:
sudo bash
apt-get install maven
apt-get install git
apt-get update
apt-get upgrade
mkdir /hadoop9000
cd /hadoop9000
git clone git://git.apache.org/hadoop-common.git
cd hadoop-common
mvn install -DskipTests
mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true
chown guest /hadoop9000 (or whoever)
sync
exit

The above will create the latest Hadoop on your machine. -To keep your production work moving along on the Hadoop 2 LCD, then also consider using the official binary install.

DEVELOPER NOTE: If you are looking to use your install for software development, note that the above step is not optional. Why? Because native mode libraries (as well as the rest of the lot) need to be generated for your R&D Platform. Depending upon the version of Linux you have, you may also need to install projects like ptotocol buffers (etc.) to compile Hadoop's C/C++ JNI underpinnings. Once created, just chum the lot of the 50-or-so libraries into /hadoop9000/lib. Why? Because you will want to use those JARs in your IDE (eclipse, netbeans, etc.) from a single standard location.

STEP 02: Official Binary Install (REQUIRED)


If rebuilding from the source is not what you want to do, then you can simply download & unzip the hadoop tar under /hadoop9000.

Note that if using gzip compressing is on the radar, then we will need to be sure to provide the proper 32 / 64 rendition of Hadoop's native libraries, as well. (Step 01 can build those native libraries for us, too. Use: mvn compile -Pnative )

STEP 03: Hadoop Environment Variables (REQUIRED)


Next, those Hadoop environment variables need to be wired-into your .bashrc: (The embolden ones are required - the rest are optional)
export JAVA_HOME="/usr/lib/jvm/java-6-oracle"
export HADOOP_HOME="/hadoop9000"
export HADOOP_PATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin"

export PIG_INSTALL="$HADOOP_HOME/pig"
export HIVE_INSTALL="$HADOOP_HOME/hive/hive-0.12.0-bin"
export PATH="$PATH:$HADOOP_PATH:$PIG_INSTALL/bin:$HIVE_INSTALL/bin"
export HADOOP_COMMON_LIB_NATIVE_DIR="$HADOOP_HOME/lib/native/"
# Can also place into hadoop-env.sh
#export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/"
Yes, Oracle Java (/usr/lib/jvm/java-6-oracle) is officially endorsed as the the best real-world way to go on Hadoop.

Chasing the Moths


When debugging, do not forget to edit the hadoop-env.sh:
HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5123"

This should get you started. If you need to learn more, then consider signing-up for our next week of virtual training on Hadoop.

Enjoy,

-Rn




[ add comment ] ( 945 views )   |  permalink  |  related link

<<First <Back | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | Next> Last>>