Hadoop集群搭建中的几个脚本
上一篇使用PXE安装了集群的操作系统,接下来配置一下各个主机,以及安装Hadoop。本文不会讲解Hadoop搭建以及配置,网上以及官方文档有很详细的解释,按照版本和不同情况配置就好。shell脚本在ubuntu下运行,需要通用的话还需要修改一点内容,所以也就直接贴在这不放github了。
初始化脚本
首先是系统安装完的初始化脚本,脚本中设置hostname、IP、DNS,因为PXE安装完apt更新源地址设成了当时的安装服务器,现在还需要把它改成ubuntu官方中国镜像地址,然后安装sshd,产生ssh的key并加入到本地登录,最后下载下一步安装hadoop的脚本。这里有一点,网上有163提供的ubuntu镜像地址,但是貌似现在有坑,比如sshd就不能正常安装,原因不知。
脚本通过wget获取需要的文件,其中配置文件和脚本都放在web服务器的conf目录下面,我的地址是http://192.168.1.230/conf/。下面有更新源地址sources.list文件、网卡设置interfaces文件。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
#!/bin/sh #init after installing system if [ $EUID -ne 0 ]; then echo "This script must run as root" 1>&2 exit 1 fi webserver="http://192.168.1.230" basepath=$(cd `dirname $0`; pwd) if [ ! -d "$basepath/download" ]; then mkdir $basepath/download fi rm -f $basepath/download/* # set update sources wget -P $basepath/download $webserver/conf/sources.list cp $basepath/download/sources.list /etc/apt/sources.list # set hostname read -p "input hostname:" newname /bin/hostname $newname echo $newname > /etc/hostname # set IP read -p "input new ip: 192.168.1." newip wget -P $basepath/download $webserver/conf/interfaces cat $basepath/download/interfaces | sed "s/^address.*/address 192.168.1.${newip}/g" > /etc/network/interfaces # set DNS echo "nameserver 202.197.64.6" > /etc/resolvconf/resolv.conf.d/base echo "nameserver 114.114.114.114" >> /etc/resolvconf/resolv.conf.d/base /etc/init.d/networking restart ifdown eth0 ifup eth0 resolvconf -u # install sshd apt-get update apt-get -y install openssh-client apt-get -y install openssh-sftp-server apt-get -y install ssh-import-id apt-get -y install libck-connector0 apt-get -y install libwrap0 apt-get -y install openssh-server sed -i "s/^PermitRootLogin\ without-password/PermitRootLogin\ yes/g" /etc/ssh/sshd_config service ssh restart # set ssh ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys # download shell if [ -f install.sh ]; then rm -f install.sh fi wget $webserver/conf/install.sh chmod +x ./install.sh # install JDK8 Hadoop2.6.2 Mahout0.11.1 if [ "$1" = "--all" ]; then ./install.sh fi |
下面是sources.list文件和interfaces文件。interfaces根据情况修改。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# deb cdrom:[Ubuntu 14.04 LTS _Trusty Tahr_ - Release amd64 (20140417)]/ trusty main restricted deb http://cn.archive.ubuntu.com/ubuntu/ trusty main restricted deb-src http://cn.archive.ubuntu.com/ubuntu/ trusty main restricted deb http://cn.archive.ubuntu.com/ubuntu/ trusty-updates main restricted deb-src http://cn.archive.ubuntu.com/ubuntu/ trusty-updates main restricted deb http://cn.archive.ubuntu.com/ubuntu/ trusty universe deb-src http://cn.archive.ubuntu.com/ubuntu/ trusty universe deb http://cn.archive.ubuntu.com/ubuntu/ trusty-updates universe deb-src http://cn.archive.ubuntu.com/ubuntu/ trusty-updates universe deb http://cn.archive.ubuntu.com/ubuntu/ trusty multiverse deb-src http://cn.archive.ubuntu.com/ubuntu/ trusty multiverse deb http://cn.archive.ubuntu.com/ubuntu/ trusty-updates multiverse deb-src http://cn.archive.ubuntu.com/ubuntu/ trusty-updates multiverse deb http://cn.archive.ubuntu.com/ubuntu/ trusty-security main restricted deb-src http://cn.archive.ubuntu.com/ubuntu/ trusty-security main restricted deb http://cn.archive.ubuntu.com/ubuntu/ trusty-security universe deb-src http://cn.archive.ubuntu.com/ubuntu/ trusty-security universe deb http://cn.archive.ubuntu.com/ubuntu/ trusty-security multiverse deb-src http://cn.archive.ubuntu.com/ubuntu/ trusty-security multiverse |
JDK、Hadoop、Mahout下载安装
安装包位置根据脚本中webserver变量设置,包名也在里面写着都可以修改。另外还需要环境变量配置脚本java.sh,会自动下载后放在/etc/profile.d/。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
#! /bin/bash #install JDK Hadoop Mahout if [ $EUID -ne 0 ]; then echo "This script must run as root" 1>&2 exit 1 fi basepath=$(cd `dirname $0`; pwd) webserver="http://192.168.1.230" JDK_PACKET="jdk-8u77-linux-x64.tar.gz" HADOOP_PACKET="hadoop-2.6.2.tar.gz" MAHOUT_PACKET="apache-mahout-distribution-0.11.1.tar.gz" if [ ! -d "$basepath/download" ]; then mkdir $basepath/download fi # JDK wget -P $basepath/download $webserver/$JDK_PACKET if [ ! -d "/opt/java" ]; then mkdir /opt/java fi tar -xzf download/$JDK_PACKET -C /opt/java # Hadoop wget -P $basepath/download $webserver/$HADOOP_PACKET if [ ! -d "/opt/hadoop" ]; then mkdir /opt/hadoop fi tar -xzf download/$HADOOP_PACKET -C /opt/hadoop # Mahout wget -P $basepath/download $webserver/$MAHOUT_PACKET if [ ! -d "/opt/mahout" ]; then mkdir /opt/mahout fi tar -xzf download/$MAHOUT_PACKET -C /opt/mahout # PATH wget -P $basepath/download $webserver/conf/java.sh cp $basepath/download/java.sh /etc/profile.d/ source /etc/profile |
这个脚本使用source install.sh执行,要不然最后一句不起作用需要手动运行最后一句。
java.sh脚本还是放到服务器conf目录中。
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# /etc/profile.d/ JAVA_HOME=/opt/java/jdk1.8.0_77 HADOOP_HOME=/opt/hadoop/hadoop-2.6.2 MAHOUT_HOME=/opt/mahout/apache-mahout-distribution-0.11.1 MAHOUT_CONF_DIR=$MAHOUT_HOME/conf CLASSPATH=$JAVA_HOME/jre/lib/ext:$JAVA_HOME/lib/tools.jar PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$MAHOUT_HOME/bin:$PATH export PATH JAVA_HOME CLASSPATH HADOOP_HOME MAHOUT_HOME MAHOUT_CONF_DIR |
把主节点的ssh公钥发到从节点
这个在主节点跑,除了发送公钥外,一并取得其hostname并加入到主节点的hosts文件里。
1 2 3 4 5 6 7 8 9 10 |
#!/bin/sh # push pub key to others && get hostname to write hosts read -p "Input remote IP: 192.168.1." ipd ip="192.168.1.$ipd" ssh-copy-id -o StrictHostKeyChecking=no -i ~/.ssh/id_rsa.pub root@$ip hostname=`ssh root@${ip} 'hostname'` echo "add /etc/hosts..." echo "$ip $hostname" >> /etc/hosts echo "remote hostname is $hostname. copy successful." |
主节点hosts分发到从节点
根据本地hosts自动分发下去。
1 2 3 4 5 6 7 8 |
#!/bin/sh # push hosts to others cat /etc/hosts | while read LINE do ip=`echo $LINE | awk '{print $1}' | grep -v "::" | grep -v "127.0.0.1"` echo "Copying /etc/hosts to ${ip}" scp -o StrictHostKeyChecking=no /etc/hosts root@${ip}:/etc/ done |
后面只需要配置好主节点的haoop的xml文件,然后写个类似于上面的分发脚本就可以了。