STEP 2 : Hadoop Cluster Installation : Getting your Virtual Machines READY
Cloudera Manager Supports :
- ◾Red Hat Enterprise Linux 5.7 and CentOS 5.7, 64-bit
- ◾Red Hat Enterprise Linux 6.2 and 6.4, and CentOS 6.2 and 6.4, 64-bit
- ◾Firefox 11 or later or
- ◾Google Chrome or
- ◾Internet Explorer 9
Supported Databases for Cloudera Manager
- ◾Cloudera Manager requires several databases.
- ◾The Cloudera Manager server stores information about configured services, role assignments, configuration history, commands, users, and running processes in a database of its own.
- ◾The Activity Monitor, Service Monitor, Report Manager, and Host Monitor also each use a database to store information.
- ◾The embedded PostgreSQL database
- ◾MySQL:◦5.0 ◦5.1 ◦5.5
- ◾•Oracle◦10g Release 2 ◦11g Release 2
- ◾•PostgreSQL◦8.1 ◦8.3◦8.4◦9.1
CDH Version Support :
- •Cloudera Manager supports CDH3 Update 1 (cdh3u1) or later and CDH4.0 or later
Why PARCELS :
- ◾Upgrade to the latest CDH minor release with just a few mouse clicks, and
- ◾Even without taking any downtime on your cluster
- ◾Requires Cloudera Manager 4.5 and later
Why Cloudera Manager :
- The Cloudera Manager Installer enables you to install Cloudera Manager and
- bootstrap an entire CDH cluster, requiring only that you have SSH access to your cluster’s machines, and that those machines have Internet access.
The Cloudera Manager Installer will automatically:
- Detect the operating system on the Cloudera Manager host
- Install the package repository for Cloudera Manager and the Java Runtime
- Environment (JRE)
- Install the JRE if it’s not already installed
- Install and configure an embedded PostgreSQL database
- Install and run the Cloudera Manager Server
- Once server installation is complete, you can browse to Cloudera Manager’s web interface and use the cluster installation wizard to set up your CDH cluster.
Getting your Virtual Machines READY:
- Decide the number of Nodes required to form your cluster.
- Make that many number of Virtual Machines as mentioned in my previous blog.
- Edit /etc/resolv.conf
- resolv.conf : is the name of a computer file used in various operating systems to configure the Domain Name System (DNS) resolver library
- is the resolver configuration file
- contains information that determines the operational parameters of the DNS resolver.
The DNS resolver allows applications running in the operating system to translate human-friendly domain names into the numeric IP addresses that are required for access to resources on the local area network or the Internet.
- search example.com
- nameserver 172.16.1.254
A name server is a computer server that hosts a network service for providing responses to queries against a directory service.
- It maps a human-recognizable identifier to a system-internal, often numeric identification or addressing component.
- This service is performed by the server in response to a network service protocol request.
- An example of a name server is the server component of the Domain Name System (DNS), one of the two principal name spaces of the Internet.
- The most important function of these DNS servers is the translation (resolution) of human-memorable domain names and hostnames into the corresponding numeric Internet Protocol (IP) addresses
- A domain name (for instance, “example.com”) is an identification string that defines a realm of administrative autonomy, authority or control on the Internet.
- Domain names are formed by the rules and procedures of the Domain Name System (DNS)
- Any name registered in the DNS is a domain name
- hostname is a label that is assigned to a device connected to a computer network and that is used to identify the device in various forms of electronic communication such as the World Wide Web, e-mail or Usenet. Hostnames may be simple names consisting of a single word or phrase, or they may be structured.
Domain Name System (DNS) is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities.
Domain Name System is that it serves as the phone book for the Internet by translating human-friendly computer hostnames into IP addresses. For example, the domain name http://www.example.com translates to the addresses 22.214.171.124 (IPv4) and 2606:2800:220:6d:26bf:1447:1097:aa7 (IPv6). Unlike a phone book, the DNS can be quickly updated,
File name to edit is /etc/resolv.conf and not /etc/resolve.conf
By default this file should be populated, else edit it with appropriate values as below :
- domain localdomain
- search localdomain
- nameserver 192.168.85.2
2. Edit /etc/sysconfig/network
(Hostname reflect in command prompt eg :- [root@www adminuser]#
About sysconfig-network files : http://www.centos.org/docs/5/html/5.2/Deployment_Guide/s2-sysconfig-network.html
3. Disable the Selinux an all nodes
- SELINUX=disabled —- # — change the value from enforcing to disabled
Check using the command : Sestatus
You can also change the policy live like this:
- setenforce 0 ‘to disable
- setenforce 1 ‘to enable
Reboot the VM for the change to reflect the change
- A Linux firewall is software based firewall that provides protection between your server (workstation) and damaging content on the Internet or network.
- It will try to guard your computer against both malicious users and software such as viruses/worms.
Turn off firewall on boot:
- chkconfig iptables off
5. Edit your hosts file
vi /edit/hosts ——-add your hosts in the file
6. Generate ssh-key file SSH
Verify SSH installation
The first step is to check whether SSH is installed on your nodes. We can easily do this
by use of the “which” UNIX command:
[hadoop-user@master]$ which ssh /usr/bin/ssh
[hadoop-user@master]$ which sshd
[hadoop-user@master]$ which ssh-keygen /usr/bin/ssh-keygen
If you instead receive an error message such as this,
/usr/bin/which: no ssh in (/usr/bin:/bin:/usr/sbin…
install OpenSSH (www.openssh.com) via a Linux package manager or by downloading the source directly. (Better yet, have your system administrator do it for you.)
Generate SSH key pair
Having verified that SSH is correctly installed on all nodes of the cluster, we use sshkeygen on the master node to generate an RSA key pair. Be certain to avoid entering a passphrase, or you’ll have to manually enter that phrase every time the master node attempts to access another node.
[hadoop-user@master]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop-user/.ssh/id_rsa): Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop-user/.ssh/id_rsa. Your public key has been saved in /home/hadoop-user/.ssh/id_rsa.pub.
After creating your key pair, your public key will be of the form
[hadoop-user@master]$ more /home/hadoop-user/.ssh/id_rsa.pub ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA1WS3RG8LrZH4zL2/1oYgkV1OmVclQ2OO5vRi0Nd K51Sy3wWpBVHx82F3x3ddoZQjBK3uvLMaDhXvncJG31JPfU7CTAfmtgINYv0kdUbDJq4TKG/fuO5q J9CqHV71thN2M310gcJ0Y9YCN6grmsiWb2iMcXpy2pqg8UM3ZKApyIPx99O1vREWm+4moFTg YwIl5be23ZCyxNjgZFWk5MRlT1p1TxB68jqNbPQtU7fIafS7Sasy7h4eyIy7cbLh8x0/V4/mcQsY 5dvReitNvFVte6onl8YdmnMpAh6nwCvog3UeWWJjVZTEBFkTZuV1i9HeYHxpm1wAzcnf7az78jT IRQ== hadoop-user@master
and we next need to distribute this public key across your cluster.
Distribute public key and validate logins
Albeit a bit tedious, you’ll next need to copy the public key to every slave node as well as the master node:
[hadoop-user@master]$ scp ~/.ssh/id_rsa.pub hadoop-user@target:~/master_key
Manually log in to the target node and set the master key as an authorized key (or append to the list of authorized keys if you have others defined).
[hadoop-user@target]$ mkdir ~/.ssh
[hadoop-user@target]$ chmod 700 ~/.ssh
[hadoop-user@target]$ mv ~/master_key ~/.ssh/authorized_keys
[hadoop-user@target]$ chmod 600 ~/.ssh/authorized_keys
After generating the key, you can verify it’s correctly defined by attempting to log in to the target node from the master:
[hadoop-user@master]$ ssh target
The authenticity of host ‘target (xxx.xxx.xxx.xxx)’ can’t be established. RSA key fingerprint is 72:31:d8:1b:11:36:43:52:56:11:77:a4:ec:82:03:1d. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added ‘target’ (RSA) to the list of known hosts. Last login: Sun Jan 4 15:32:22 2009 from master
After confirming the authenticity of a target node to the master node, you won’t be prompted upon subsequent login attempts.
[hadoop-user@master]$ ssh target Last login: Sun Jan 4 15:32:49 2009 from master
7. Edit /etc/ssh/ssh_config
- change the strictHostkeychecking to No from yes..
8. Update the packages of the system
- yum –y update
9. Apply all the previous steps on all nodes and change ip address and host-names accordingly to the /etc/hosts file
10. Download the cloudera-manager-installer.bin in the machine which would act as the Master server
CM Installer Download : http://archive.cloudera.com/cm4/installer/
To be continued ..Stay tuned