[TOC]
什么是OpenHPC OpenHPC是一个聚合了众多常见部署应用和管理的系统框架.简单来说就是,集群中想要部署什么应用只需要一次制作好,集群中各个节点运行相同的image,完全统一,适合做高性能计算集群的搭建 .官方原话如下:
1 Welcome to the OpenHPC site. OpenHPC is a collaborative, community effort that initiated from a desire to aggregate a number of common ingredients required to deploy and manage High Performance Computing (HPC) Linux clusters including provisioning tools, resource management, I/O clients, development tools, and a variety of scientific libraries. Packages provided by OpenHPC have been pre-built with HPC integration in mind with a goal to provide re-usable building blocks for the HPC community.
我这是在CentOS7上面搭建的,官方给出了相应的安装文档 ,不过里面东西很多,没告诉你为什么这么做,照着做也会出很多问题.下面我就结合自己制作的操作过程记录如下:
1.master配置 a.master节点配置本地yum源 首先创建目录存放节点所需文件,以及配置好yum源,我在集群制作了一个本地yum源,方便快速下载安装.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 mkdir -pv /atlas/os_images/compute_node_v0.2.0 mkdir /atlas/os_images/tftpboot_v0.2.0 cat /etc/yum.repos.d/base.repo [development] name=development baseurl=ftp://172.16.10.10/centos7.2 gpgcheck=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7 [epel] name=epel baseurl=ftp://172.16.10.10/epel gpgcheck=0 [local-slurm] name=slurm baseurl=ftp://172.16.10.10/slurm gpgcheck=0 yum clean all yum repolist
b.配置master的openhpc服务 在master上安装所需软件
1 2 3 4 5 6 7 8 9 10 yum -y groupinstall ohpc-base yum -y groupinstall ohpc-warewulf yum -y groupinstall "InfiniBand Support" yum -y install infinipath-psm systemctl enable ntpd.service #节点与master之间通信,时间准确度要求比较高. systemctl restart ntpd systemctl start rdma systemctl status rdma
配置pxe启动目录,以及所使用的网卡.最后启动pxe服务
1 2 3 4 5 6 7 8 9 10 vim /etc/warewulf/provision.conf network device = enp129s0f0 #master上面网卡名 tftpdir = /atlas/os_images/tftpboot_v0.2.0 vim /etc/xinetd.d/tftp server_args = -s /atlas/os_images/tftpboot_v0.2.0 disable = no systemctl restart xinetd systemctl start tftp.socket systemctl start tftp.service
系统启动后会针对系统完成一些配置,而这些配置是通过http从master服务器上面down下来的.down下来的文件放在 /warewulf 目录下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 export MODFILE=/etc/httpd/conf.d/warewulf-httpd.conf perl -pi -e "s/cgi-bin>\$/cgi-bin>\n Require all granted/" $MODFILE perl -pi -e "s/Allow from all/Require all granted/" $MODFILE perl -ni -e "print unless /^\s+Order allow,deny/" $MODFILE cat /etc/httpd/conf.d/warewulf-httpd.conf LoadModule perl_module modules/mod_perl.so PerlSwitches -w PerlSwitches -T PerlSwitches -I/var/www/stage/cgi-bin # This is disabled as RHEL6 perl_mod seems to be missing this support #PerlPreConnectionHandler Apache2::Reload Alias /WW/static /usr/share/warewulf/www ScriptAlias /WW/file /usr/libexec/warewulf/cgi-bin/file.pl ScriptAlias /WW/script /usr/libexec/warewulf/cgi-bin/script.pl ScriptAlias /WW/nodeconfig /usr/libexec/warewulf/cgi-bin/nodeconfig.pl ScriptAlias /WW/vnfs /usr/libexec/warewulf/cgi-bin/vnfs.pl <Directory /usr/libexec/warewulf/cgi-bin> Require all granted SetHandler perl-script PerlResponseHandler ModPerl::Registry PerlOptions +ParseHeaders Options +ExecCGI </Directory> <Directory /usr/share/warewulf/www> Options Indexes MultiViews AllowOverride None Require all granted </Directory>
集群针对不同节点的mac地址设置不同的IP,主机名等信息都是保存在数据库中,因此要用到mariadb
1 2 3 4 5 systemctl enable mariadb.service systemctl restart mariadb systemctl enable httpd.service systemctl restart httpd
2.制作node节点所需配置 首先要制作的是node节点的系统文件,然后再安装所需要的应用
安装系统文件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 vim /usr/libexec/warewulf/wwmkchroot/centos-7.tmpl YUM_MIRROR="ftp://172.16.10.10/centos7.2" #这里是启动最小系统所必须用到的软件安装源 #如果不修改这个,系统会从官方源去下载,由于我自己做这个要反复重新安装测试.为了快速所以这样设置 #如果要在线下载,由于centos系统已经到7.3了,上面原有了路径要自己更新 export CHROOT=/atlas/os_images/compute_node_v0.2.0 wwmkchroot centos-7 $CHROOT #这一步就是将系统文件安装到$CHROOT中 cp -p /etc/resolv.conf $CHROOT/etc/resolv.conf cd /atlas/os_images/compute_node_v0.2.0/etc/yum.repos.d/ rm -f CentOS-* cat base.repo cat OpenHPC.repo.bak #这里我同样把节点安装软件的源改到了本地,方便快捷(包括openhpc的也已经下载到了本地)
接下来就安装节点所需应用了
安装常用基础应用 1 2 3 4 5 6 7 yum -y --installroot=$CHROOT install ntp kernel gcc make grub2-tools environment-modules yum -y --installroot=$CHROOT groupinstall "InfiniBand Support" yum -y --installroot=$CHROOT install infinipath-psm chroot $CHROOT systemctl enable rdma chroot $CHROOT systemctl enable ntpd echo "server 172.16.10.10" >> $CHROOT/etc/ntp.conf #这个IP是对应master的IP,同步时间用
安装munge 配置安装munge,为后面slurm安装做好基础准备,这里注意几点,muster和node的munge和slurm用户名的id号码要一样
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 export MUNGEUSER=1050 groupadd -g $MUNGEUSER munge useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge export SlurmUSER=1051 groupadd -g $SlurmUSER slurm useradd -m -c "Slurm workload manager" -d /var/lib/slurm -u $SlurmUSER -g slurm -s /bin/bash slurm cat /etc/passwd |grep munge cat /etc/passwd |grep slurm cat /etc/group |grep slurm cat /etc/group |grep munge yum install munge munge-libs munge-devel -y dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key chown munge: /etc/munge/munge.key chmod 400 /etc/munge/munge.key systemctl enable munge systemctl start munge chroot $CHROOT groupadd -g 1050 munge chroot $CHROOT useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u 1050 -g munge -s /sbin/nologin munge chroot $CHROOT groupadd -g 1051 slurm chroot $CHROOT useradd -m -c "Slurm workload manager" -d /var/lib/slurm -u 1051 -g slurm -s /bin/bash slurm yum install --installroot=$CHROOT munge munge-libs munge-devel -y cp -a /etc/munge/munge.key $CHROOT/etc/munge/munge.key chroot $CHROOT systemctl enable munge chroot $CHROOT chown -R munge: /etc/munge/ /var/log/munge/ chroot $CHROOT chmod 0700 /etc/munge/ /var/log/munge/ vim /etc/warewulf/vnfs.conf #这里注意一下,好多服务的日志都会记录在/var/log目录下 #exclude += /var/log/* #这个exclude 就是制作node镜像的时候那些目录不要做进去,很明显我们需要这个目录,所以要屏蔽掉他.不然munge服务会启动不成功. #exclude += /usr/src #这个在后面nvidia显卡安装驱动的时候需要用到dkms #hybridize += /usr/lib/locale #字符集编码格式需要用到该文件目录
安装slurm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 tar xf slurm.tar cd rpmbuild/RPMS/x86_64/ yum install ./slurm-* yum --installroot=$CHROOT install slurm-devel slurm-contribs slurm-munge slurm-openlava slurm-pam_slurm slurm-perlapi slurm-plugins slurm-slurmdbd slurm-sql slurm-torque -y vim /etc/slurm/slurm.conf #slurm的配置文档 cp -a /etc/slurm/slurm.conf /atlas/os_images/compute_node_v0.2.0/etc/slurm/slurm.conf chown slurm:slurm /var/spool mkdir /var/spool/slurmctld chown slurm: /var/spool/slurmctld chmod 755 /var/spool/slurmctld touch /var/log/slurmctld.log chown slurm: /var/log/slurmctld.log chroot /atlas/os_images/compute_node_v0.2.0/ mkdir /var/spool/slurmd chown slurm: /var/spool/slurmd chmod 755 /var/spool/slurmd touch /var/log/slurmd.log chown slurm: /var/log/slurmd.log chroot $CHROOT systemctl enable slurmd.service [yhu@master etc]$sudo chmod +x rc.local [yhu@master etc]$cat rc.local ... nvidia-smi >> /dev/null systemctl restart slurmd # 调用gpu的时候没有设备文件/dev/nvidia0,通过openhpc启动起来的系统的确没有这个设备文件,所以我在 /etc.rc.local 文件中加入两行代码,创建出来这个设备文件.并重新启动slurmd服务
singularity 1 2 3 4 5 6 7 8 9 10 11 yum -y --installroot=$CHROOT install gcc make cp singularity-2.2.1.tar.gz /atlas/os_images/compute_node_v0.2.0/tmp/ chroot /atlas/os_images/compute_node_v0.2.0 cd /tmp tar xf singularity-2.2.1.tar.gz cd singularity-2.2.1 ./configure make make install cd .. rm -rf singularity-2.2.1*
environment-modules 1 2 3 4 5 yum -y --installroot=$CHROOT install environment-modules chroot $CHROOT vi /usr/share/Modules/init/.modulespath /atlas/gensoft/public_modules #只留这一个其他都屏蔽掉
nvidia显卡驱动,cuda,cudnn 1 2 3 4 5 6 7 8 9 10 chroot /atlas/os_images/compute_node_v0.2.0/ # nvidia-kmod安装之后,并不会立即替换掉so动态加载库,所以需要删除原来的,加载新的。这样就不需要重启了。 yum remove nvidia-kmod yum install nvidia-kmod xorg-x11-drv-nvidia* rm -rf /var/lib/dkms/nvidia/375.39/ rm -rf /usr/src/nvidia-375.26 rm /usr/lib64/nvidia/*375.26 cp /var/lib/dkms/nvidia/387.26/3.10.0-327.el7.x86_64/x86_64/module/nvidia* /lib/modules/3.10.0-327.el7.x86_64/extra/ yum reinstall nvidia-kmod xorg-x11-drv-nvidia* yum clean all
配置信息保存数据库 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 cat ~/.ssh/cluster.pub >> $CHROOT/root/.ssh/authorized_keys #把秘钥保存到node节点方便直接远程 wwinit ssh_keys wwsh file import /etc/passwd # wwsh file import /etc/group #-------------------------- wwsh file import /etc/shadow #-------------------------- export WW_CONF=/etc/warewulf/bootstrap.conf echo "drivers += updates/kernel/" >> $WW_CONF wwbootstrap `uname -r` #制作bootstrap echo "GATEWAYDEV=eth0" > /tmp/network.$$ wwsh -y file import /tmp/network.$$ --name network wwsh -y file set network --path /etc/sysconfig/network --mode=0644 --uid=0 wwvnfs -y --chroot $CHROOT #将做好的node启动目录制作成vnfs镜像文件 #配置每个node的IP等 wwsh -y node new node5 --ipaddr=172.16.10.15 --hwaddr=0c:c4:7a:85:18:da --network=255.255.255.0 --gateway=172.16.10.1 -D eth0 wwsh -y node new node7 --ipaddr=172.16.10.17 --hwaddr=0c:c4:7a:82:c5:d8 --network=255.255.255.0 --gateway=172.16.10.1 -D eth0 #----------wwsh -y node delete node7 写错了可以删除,重新添加 #系统会根据这些配置,生成一个dhcp.conf文件 在pxe启动时分配IP等信息 #下面这个是系统启动后,要为每个node配置修改哪些file wwsh -y provision set "node5" --vnfs=compute_node_v0.2.0 --bootstrap=`uname -r` --files=dynamic_hosts,passwd,group,shadow,network #-------------------------- wwsh -y provision set "node7" --vnfs=compute_node_v0.2.0 --bootstrap=`uname -r` --files=dynamic_hosts,passwd,group,shadow,network #--------------------------
3.启动服务 1 2 3 4 5 systemctl restart dhcpd wwsh pxe update systemctl enable slurmctld.service systemctl start slurmctld.service systemctl status slurmctld.service
4.重启node节点 这里我直接操作ipmi重启
1 2 3 4 5 ipmitool -I lanplus -H 172.16.10.107 -U ADMIN -P ADMIN chassis bootdev pxe options=persistent ipmitool -I lanplus -H 172.16.10.107 -U ADMIN -P ADMIN chassis power reset ipmitool -I lanplus -H 172.16.10.105 -U ADMIN -P ADMIN chassis bootdev pxe options=persistent ipmitool -I lanplus -H 172.16.10.105 -U ADMIN -P ADMIN chassis power reset
启动后简单的检测 这里我直接使用slurm命令检测
1 2 3 4 5 6 7 8 9 10 11 12 13 sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test* up infinite 1 down* node5 scontrol scontrol: update nodename=node5 state=resume scontrol: exit sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test* up infinite 1 idle node5 #slurm
5.不同node节点配置 1 2 3 4 5 6 7 8 9 10 11 wwvnfs -y --chroot /atlas/os_images/compute_node_v0.2.2 -o /atlas/os_images/vnfs/compute_node_v0.2.2.vnfs wwsh vnfs import /atlas/os_images/vnfs/compute_node_v0.2.2.vnfs --name=compute_node_v0.2.2 --chroot=/atlas/os_images/compute_node_v0.2.2 -y wwsh provision set node1 -V compute_node_v0.2.2 -y wwsh provision set node2 -V compute_node_v0.2.2 -y wwsh provision set node3 -V compute_node_v0.2.2 -y wwsh provision set node4 -V compute_node_v0.2.2 -y wwsh provision set node5 -V compute_node_v0.2.2 -y wwsh provision set node6 -V compute_node_v0.2.2 -y wwsh provision set node7 -V compute_node_v0.2.2 -y
wwsh是warewulf的命令,可以直接输入wwsh进入交互.功能主要是,设置node节点启动的时候,pxe的启动设置,网络以及主机名,还有每个节点应该有哪些配置文件,等等.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 #wwsh Warewulf> help Warewulf command line shell interface Welcome to the Warewulf shell interface. This application allows you to interact with the Warewulf backend database and modules via a single interface. bootstrap Manage your bootstrap images dhcp Manage DHCP service and configuration events Control how events are handled exit Exit/leave the Warewulf shell file Manage files within the Warewulf data store node Node manipulation commands object Generically manipulate all Warewulf data store entries output Set the output verbosity level provision Node provision manipulation commands pxe Manage PXE configuration quit Exit/leave the Warewulf shell ssh Spawn parallel ssh connections to nodes. vnfs Manage your VNFS images Warewulf>