NBK43

Considering how little I have done the ‘dating thing’ in my life, I never thought I would start to experience virtual social fatigue. I should distinguish. I actually like being around people…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Setup Hadoop psuedo cluster on Ubuntu 16.04

This article to set up Hadoop on your personal Ubuntu machine from scratch. Please follow step by step and you will be able to start the Hadoop process along with yarn and we will also demonstrate how to upload the file in HDFS and execute a MR job using the HDFS uploaded file.

Please ensure java is installed on the system. If not installed please refer web and install the same.

nitin@nitin-Satellite-C850:~$ sudo apt-get install openssh-server

2. Verify that public key authentication is enabled in config file.

3. Restart ssh service

nitin@nitin-Satellite-C850:~$ sudo service ssh restart

nitin@nitin-Satellite-C850:~$ sudo addgroup hadoop

2. Create new user in group created.

nitin@nitin-Satellite-C850:~$ sudo adduser — ingroup hadoop hduser

nitin@nitin-Satellite-C850:~$ su — hduser

2. Create RSA key pair

hduser@nitin-Satellite-C850:~$ ssh-keygen -t rsa -P “”

4. Test password less authentication and you should be able to connect to localhost without entering any password.

hduser@nitin-Satellite-C850:~$ ssh localhost

2. Untar hadoop tar file.

nitin@nitin-Satellite-C850:~$ tar -xzvf hadoop-2.7.7.tar.gz

3. Move hadoop extracted folder to /usr/local via sudo access.

nitin@nitin-Satellite-C850:~$ sudo mv hadoop-2.7.7 /usr/local/hadoop

4. Change the ownership to hadoop user “hduser”.

nitin@nitin-Satellite-C850:/usr/local$ sudo chown -R hduser:hadoop /usr/local/hadoop

5. su to “hduser” and update “.bashrc” file via “vi” or “nano” editor

6. Reload “.bashrc” file.

hduser@nitin-Satellite-C850:~$ source ~/.bashrc

We will to verify the Hadoop setup in local mode first before moving to actual pseudo mode where yarn will be also running

hduser@nitin-Satellite-C850:~$ mkdir ~/input

2. Copy the test data from Hadoop setup to “input” directory.

hduser@nitin-Satellite-C850:~$ cp /usr/local/hadoop/etc/hadoop/*.xml ~/input

3. Execute Hadoop job.

hduser@nitin-Satellite-C850:~$ /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar grep ~/input ~/output ‘principal[.]*’

We will see Hadoop job will be executed and result will be stored in “output” directory at home folder from where job is executed.

We run to execute Hadoop on standalone cluster where we can execute Map-Reduce jobs, have HDFS operations.

hduser@nitin-Satellite-C850:~$ mkdir -p app/hadoop/

hduser@nitin-Satellite-C850:~$mkdir app/hadoop/namenode
hduser@nitin-Satellite-C850:~$ mkdir app/hadoop/datanode
hduser@nitin-Satellite-C850:~$ mkdir app/hadoop/tmp

2. Configure “/usr/local/hadoop/etc/hadoop/hdfs-site.xml”

Configuring replication factor as 1 from default 3
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

Configure path for name node meta data
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hduser/app/hadoop/namenode</value>
</property>

Configure path for data node meta data
<property>
<name>dfs.namenode.data.dir</name>
<value>/home/hduser/app/hadoop/datanode</value>
</property>

3. Configure “/usr/local/hadoop/etc/hadoop/core-site.xml”

Configure port number used for Hadoop instance
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

Configure temporaray directory to be used internally by hadoop system
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>

4. Configure “/usr/local/hadoop/etc/hadoop/mapred-site.xml”

Copy the mapred-site xml from mapred-site.xml.template

cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

Configure map reduce framework
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

5. Configure “/usr/local/hadoop/etc/hadoop/yarn-site.xml”

hduser@nitin-Satellite-C850:~$ hdfs namenode -format

This will result in formatting of “namenode” directory indicated by below line in the output with exit status as 0

Storage directory /home/hduser/app/hadoop/name has been successfully formatted.

hduser@nitin-Satellite-C850:~$ start-dfs.sh

hduser@nitin-Satellite-C850:~$ start-yarn.sh

hduser@nitin-Satellite-C850:~$ mr-jobhistory-daemon.sh start historyserver

hduser@nitin-Satellite-C850:~$ jps
30161 JobHistoryServer
28884 ResourceManager
30373 Jps
28681 SecondaryNameNode
28283 NameNode
28429 DataNode
29135 NodeManager

Now we have all Hadoop processes running

hduser@nitin-Satellite-C850:~$ hdfs dfs -mkdir /user

hduser@nitin-Satellite-C850:~$ hdfs dfs -mkdir /user/hduser

hduser@nitin-Satellite-C850:~$ hdfs dfs -copyFromLocal input

hduser@nitin-Satellite-C850:~$ /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar grep input output ‘principal[.]*’

3. Check output on HDFS

Hence we have created a standalone Hadoop cluster which is ready for DFS commands and to be itegrated with Sqoop and other tools.

NBK43

Setup Hadoop psuedo cluster on Ubuntu 16.04

Add a comment

Related posts:

How to create an image gallery with CSS Grid

Grand fortune casino no deposit bonus codes

Obrigado Palia