How to install Hadoop Standalone / Pseudo Distributed mode 2.X.X on Mac with OS X Mavericks

Oct 11, 2014 · 2 minute read · Comments
tech


                                                          Image source: http://www.javacodegeeks.com/2011/05/hadoop-soft-introduction.html

I was searching for a complete tutorial on installing Hadoop on Mac and play around with it. There are resources on installing Hadoop with "HomeBrew" which is the missing package manager in Mac ;). But i do not want to offload all the configuration burden to it as i need to learn this from top to bottom. I played with some and here are the configuration steps i followed.

1) You need to download and extract Hadoop Binary. I used Hadoop 2.5.1 which is the latest at the moment.

http://www.apache.org/dyn/closer.cgi/hadoop/common/

2) Extract the binary and lets called the location as HADOOP_HOME

eg: /Users/user1/software/hadoop-2.5.1

3) Add HADOOP_HOME and JAVA_HOME  as path variables to your system. You can add them to
bashrc or bash_profile.

You can add them by issuing following commands.


$vim ~/.bash_profile

Add following entries and change paths according to your machine's configurations. 


export JAVA_HOME=$(/usr/libexec/java_home)
export HADOOP_HOME=/Users/user1/software/hadoop-2.5.1
export PATH=$PATH:$HADOOP_HOME



and then reload the configurations.


$source ~/.bash_profile


(Follow these steps if you need to run in Pseudo Distributed mode. If you do continue you will have to add input files to the HDFS and then download output files from the HDFS too.)

4) Navigate to HADOOP_HOME and change following files as below.

etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

5) Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

(You may need to enable remote-login in system preferences --> sharing if you have not enabled it later to login through ssh.)
6) Starting the Hadoop in Standalone mode.
 
Navigate to $HADOOP_HOME

Format the filesystem:

$ bin/hdfs namenode -format


Start NameNode daemon and DataNode daemon:

$ sbin/start-dfs.sh



The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
Browse the web interface for the NameNode; by default it is available at:

NameNode - http://localhost:50070/

So good luck with all your map reduce jobs. :)

References :
http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation




comments powered by Disqus