Create a Hadoop Compilation and Development Environment

If you want to read the source code of Hadoop and dig into the internals of Hadoop, then you need to know how to compile the source code and use IDEs (such as Eclipse or IntelliJ Idea) to open the projects. This blog will introduce you how to create a Hadoop build and development environment.

Environment: CentOS 6.6, Oracle JDK 1.7.0_75

1. Install Oracle JDK 7

Download jdk-7u75-linux-x64.rpm
sudo yum localinstall -y ./jdk-7u75-linux-x64.rpm

For now if you use JDK 8 to compile the source code of Hadoop, it will fail because the javadoc in Java 8 is considerably more strict than the one in earlier version, see more detailes here

2. Install Maven

wget http://mirrors.gigenet.com/apache/maven/maven-3/3.2.5/binaries/apache-maven-3.2.5-bin.tar.gz
sudo tar -zxf apache-maven-3.2.5-bin.tar.gz -C /opt
sudo vim /etc/profile
export M2_HOME=/opt/apache-maven-3.2.5
export PATH=$M2_HOME/bin:$PATH

3. Install FindBugs(Optional)

wget http://iweb.dl.sourceforge.net/project/findbugs/findbugs/3.0.0/findbugs-3.0.0.tar.gz
sudo tar -zxf findbugs-3.0.0.tar.gz -C /opt
sudo vim /etc/profile
export FB_HOME=/opt/findbugs-3.0.0
export PATH=$FB_HOME/bin:$PATH

If you want to run mvn compile findbugs:findbugs then you need to install FindBugs.

4. Install build tools

sudo yum install -y gcc-c++ cmake autoconf automake

5. Install native libraries packages

sudo yum -y install  gcc-c++ lzo-devel  zlib-devel libtool openssl-devel

Optional:

sudo yum install -y snappy-devel
sudo yum install -y svn

During the process of compilation, it will throw out some warning messages if there is no svn command.

5. Install Protobuf

wget https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
tar -zxf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
./configure
make
sudo make install
rm -rf ./protobuf-2.5.0
rm ./protobuf-2.5.0.tar.gz

NOTE: The Protobuf version must be exactly 2.5.0, or the compilation will fail.

6. Download the source code

git clone git@github.com:apache/hadoop.git

7. Compile

Create binary distribution with native code and without documentation.

mvn clean package -Pdist,native -Dtar -DskipTests -Drequire.snappy -Drequire.openssl

8. Install IntelliJ Idea

wget http://download-cf.jetbrains.com/idea/ideaIC-14.0.3.tar.gz
sudo tar -zxf ideaIC-14.0.3.tar.gz -C /opt
Launch IntelliJ Idea, Create Desktop Entry

9. Open projects in IntelliJ Idea

First you need to run

mvn install -DskipTests

Since some submodules of Hadoop depend on other submodules, so you need to run mvn install -DskipTests to copy jars of Hadoop submodules to local $HOME/.m2 , so that Maven won’t download them from Internet. If you omit this step, Maven will try to download them from public Maven repository and will fail, because the newest version of Hadoop jars are not available in public Maven repository yet.

Then click File->Open to open the pom.xml in the root directory of Hadoop repo.

Note: Don’t use mvn idea:idea to generate IntelliJ Idea projects first , IntelliJ Idea can handle pom.xml quite well, besides, the “Maven IDEA Plugin” has already RETIRED.

10. Increase the memory for IntelliJ Idea(Optional)

There are many classes in Hadoop source code, if your IntelliJ hangs and doesn’t respond from time to time, then you can try to give more memory to IntelliJ Idea by modifying the file idea64.vmoptions in the installation directory. Example below:

-Xms128m
-Xmx4096m
-XX:MaxPermSize=1024m

Reference

In my next blog, I’ll explain how to Debug Hadoop Applications with IntelliJ

soulmachine

Machine has soul.