openhpc

[TOC]

什么是OpenHPC

OpenHPC是一个聚合了众多常见部署应用和管理的系统框架.简单来说就是,集群中想要部署什么应用只需要一次制作好,集群中各个节点运行相同的image,完全统一,适合做高性能计算集群的搭建 .官方原话如下:

1
Welcome to the OpenHPC site. OpenHPC is a collaborative, community effort that initiated from a desire to aggregate a number of common ingredients required to deploy and manage High Performance Computing (HPC) Linux clusters including provisioning tools, resource management, I/O clients, development tools, and a variety of scientific libraries. Packages provided by OpenHPC have been pre-built with HPC integration in mind with a goal to provide re-usable building blocks for the HPC community. 

我这是在CentOS7上面搭建的,官方给出了相应的安装文档,不过里面东西很多,没告诉你为什么这么做,照着做也会出很多问题.下面我就结合自己制作的操作过程记录如下:

1.master配置

a.master节点配置本地yum源

首先创建目录存放节点所需文件,以及配置好yum源,我在集群制作了一个本地yum源,方便快速下载安装.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
mkdir -pv /atlas/os_images/compute_node_v0.2.0
mkdir /atlas/os_images/tftpboot_v0.2.0

cat /etc/yum.repos.d/base.repo
[development]
name=development
baseurl=ftp://172.16.10.10/centos7.2
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

[epel]
name=epel
baseurl=ftp://172.16.10.10/epel
gpgcheck=0

[local-slurm]
name=slurm
baseurl=ftp://172.16.10.10/slurm
gpgcheck=0

yum clean all
yum repolist

b.配置master的openhpc服务

在master上安装所需软件

1
2
3
4
5
6
7
8
9
10
yum -y groupinstall ohpc-base
yum -y groupinstall ohpc-warewulf
yum -y groupinstall "InfiniBand Support"
yum -y install infinipath-psm

systemctl enable ntpd.service #节点与master之间通信,时间准确度要求比较高.
systemctl restart ntpd
systemctl start rdma
systemctl status rdma

配置pxe启动目录,以及所使用的网卡.最后启动pxe服务

1
2
3
4
5
6
7
8
9
10
vim /etc/warewulf/provision.conf
network device = enp129s0f0 #master上面网卡名
tftpdir = /atlas/os_images/tftpboot_v0.2.0
vim /etc/xinetd.d/tftp
server_args = -s /atlas/os_images/tftpboot_v0.2.0
disable = no
systemctl restart xinetd
systemctl start tftp.socket
systemctl start tftp.service

系统启动后会针对系统完成一些配置,而这些配置是通过http从master服务器上面down下来的.down下来的文件放在 /warewulf 目录下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
export MODFILE=/etc/httpd/conf.d/warewulf-httpd.conf
perl -pi -e "s/cgi-bin>\$/cgi-bin>\n Require all granted/" $MODFILE
perl -pi -e "s/Allow from all/Require all granted/" $MODFILE
perl -ni -e "print unless /^\s+Order allow,deny/" $MODFILE


cat /etc/httpd/conf.d/warewulf-httpd.conf
LoadModule perl_module modules/mod_perl.so
PerlSwitches -w
PerlSwitches -T
PerlSwitches -I/var/www/stage/cgi-bin

# This is disabled as RHEL6 perl_mod seems to be missing this support
#PerlPreConnectionHandler Apache2::Reload

Alias /WW/static /usr/share/warewulf/www

ScriptAlias /WW/file /usr/libexec/warewulf/cgi-bin/file.pl
ScriptAlias /WW/script /usr/libexec/warewulf/cgi-bin/script.pl
ScriptAlias /WW/nodeconfig /usr/libexec/warewulf/cgi-bin/nodeconfig.pl
ScriptAlias /WW/vnfs /usr/libexec/warewulf/cgi-bin/vnfs.pl

<Directory /usr/libexec/warewulf/cgi-bin>
Require all granted
SetHandler perl-script
PerlResponseHandler ModPerl::Registry
PerlOptions +ParseHeaders
Options +ExecCGI
</Directory>

<Directory /usr/share/warewulf/www>
Options Indexes MultiViews
AllowOverride None
Require all granted
</Directory>

集群针对不同节点的mac地址设置不同的IP,主机名等信息都是保存在数据库中,因此要用到mariadb

1
2
3
4
5
systemctl enable mariadb.service
systemctl restart mariadb
systemctl enable httpd.service
systemctl restart httpd

2.制作node节点所需配置

首先要制作的是node节点的系统文件,然后再安装所需要的应用

安装系统文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
vim  /usr/libexec/warewulf/wwmkchroot/centos-7.tmpl
YUM_MIRROR="ftp://172.16.10.10/centos7.2" #这里是启动最小系统所必须用到的软件安装源
#如果不修改这个,系统会从官方源去下载,由于我自己做这个要反复重新安装测试.为了快速所以这样设置
#如果要在线下载,由于centos系统已经到7.3了,上面原有了路径要自己更新

export CHROOT=/atlas/os_images/compute_node_v0.2.0
wwmkchroot centos-7 $CHROOT #这一步就是将系统文件安装到$CHROOT中

cp -p /etc/resolv.conf $CHROOT/etc/resolv.conf

cd /atlas/os_images/compute_node_v0.2.0/etc/yum.repos.d/
rm -f CentOS-*
cat base.repo
cat OpenHPC.repo.bak
#这里我同样把节点安装软件的源改到了本地,方便快捷(包括openhpc的也已经下载到了本地)

接下来就安装节点所需应用了

安装常用基础应用

1
2
3
4
5
6
7
yum -y --installroot=$CHROOT install ntp kernel gcc make grub2-tools environment-modules
yum -y --installroot=$CHROOT groupinstall "InfiniBand Support"
yum -y --installroot=$CHROOT install infinipath-psm
chroot $CHROOT systemctl enable rdma
chroot $CHROOT systemctl enable ntpd
echo "server 172.16.10.10" >> $CHROOT/etc/ntp.conf #这个IP是对应master的IP,同步时间用

安装munge

配置安装munge,为后面slurm安装做好基础准备,这里注意几点,muster和node的munge和slurm用户名的id号码要一样

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
export MUNGEUSER=1050
groupadd -g $MUNGEUSER munge
useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge
export SlurmUSER=1051
groupadd -g $SlurmUSER slurm
useradd -m -c "Slurm workload manager" -d /var/lib/slurm -u $SlurmUSER -g slurm -s /bin/bash slurm
cat /etc/passwd |grep munge
cat /etc/passwd |grep slurm
cat /etc/group |grep slurm
cat /etc/group |grep munge
yum install munge munge-libs munge-devel -y
dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key
chown munge: /etc/munge/munge.key
chmod 400 /etc/munge/munge.key
systemctl enable munge
systemctl start munge
chroot $CHROOT groupadd -g 1050 munge
chroot $CHROOT useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u 1050 -g munge -s /sbin/nologin munge
chroot $CHROOT groupadd -g 1051 slurm
chroot $CHROOT useradd -m -c "Slurm workload manager" -d /var/lib/slurm -u 1051 -g slurm -s /bin/bash slurm
yum install --installroot=$CHROOT munge munge-libs munge-devel -y
cp -a /etc/munge/munge.key $CHROOT/etc/munge/munge.key
chroot $CHROOT systemctl enable munge
chroot $CHROOT chown -R munge: /etc/munge/ /var/log/munge/
chroot $CHROOT chmod 0700 /etc/munge/ /var/log/munge/
vim /etc/warewulf/vnfs.conf #这里注意一下,好多服务的日志都会记录在/var/log目录下
#exclude += /var/log/* #这个exclude 就是制作node镜像的时候那些目录不要做进去,很明显我们需要这个目录,所以要屏蔽掉他.不然munge服务会启动不成功.
#exclude += /usr/src #这个在后面nvidia显卡安装驱动的时候需要用到dkms
#hybridize += /usr/lib/locale #字符集编码格式需要用到该文件目录

安装slurm

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
tar xf slurm.tar
cd rpmbuild/RPMS/x86_64/
yum install ./slurm-*
yum --installroot=$CHROOT install slurm-devel slurm-contribs slurm-munge slurm-openlava slurm-pam_slurm slurm-perlapi slurm-plugins slurm-slurmdbd slurm-sql slurm-torque -y

vim /etc/slurm/slurm.conf #slurm的配置文档

cp -a /etc/slurm/slurm.conf /atlas/os_images/compute_node_v0.2.0/etc/slurm/slurm.conf
chown slurm:slurm /var/spool
mkdir /var/spool/slurmctld
chown slurm: /var/spool/slurmctld
chmod 755 /var/spool/slurmctld
touch /var/log/slurmctld.log
chown slurm: /var/log/slurmctld.log
chroot /atlas/os_images/compute_node_v0.2.0/
mkdir /var/spool/slurmd
chown slurm: /var/spool/slurmd
chmod 755 /var/spool/slurmd
touch /var/log/slurmd.log
chown slurm: /var/log/slurmd.log
chroot $CHROOT systemctl enable slurmd.service


[yhu@master etc]$sudo chmod +x rc.local
[yhu@master etc]$cat rc.local
...
nvidia-smi >> /dev/null
systemctl restart slurmd
# 调用gpu的时候没有设备文件/dev/nvidia0,通过openhpc启动起来的系统的确没有这个设备文件,所以我在 /etc.rc.local 文件中加入两行代码,创建出来这个设备文件.并重新启动slurmd服务

singularity

1
2
3
4
5
6
7
8
9
10
11
yum -y --installroot=$CHROOT install gcc make
cp singularity-2.2.1.tar.gz /atlas/os_images/compute_node_v0.2.0/tmp/
chroot /atlas/os_images/compute_node_v0.2.0
cd /tmp
tar xf singularity-2.2.1.tar.gz
cd singularity-2.2.1
./configure
make
make install
cd ..
rm -rf singularity-2.2.1*

environment-modules

1
2
3
4
5
yum -y --installroot=$CHROOT install environment-modules
chroot $CHROOT vi /usr/share/Modules/init/.modulespath
/atlas/gensoft/public_modules #只留这一个其他都屏蔽掉


nvidia显卡驱动,cuda,cudnn

1
2
3
4
5
6
7
8
9
10
chroot /atlas/os_images/compute_node_v0.2.0/
# nvidia-kmod安装之后,并不会立即替换掉so动态加载库,所以需要删除原来的,加载新的。这样就不需要重启了。
yum remove nvidia-kmod
yum install nvidia-kmod xorg-x11-drv-nvidia*
rm -rf /var/lib/dkms/nvidia/375.39/
rm -rf /usr/src/nvidia-375.26
rm /usr/lib64/nvidia/*375.26
cp /var/lib/dkms/nvidia/387.26/3.10.0-327.el7.x86_64/x86_64/module/nvidia* /lib/modules/3.10.0-327.el7.x86_64/extra/
yum reinstall nvidia-kmod xorg-x11-drv-nvidia*
yum clean all

配置信息保存数据库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
cat ~/.ssh/cluster.pub >> $CHROOT/root/.ssh/authorized_keys #把秘钥保存到node节点方便直接远程
wwinit ssh_keys
wwsh file import /etc/passwd #
wwsh file import /etc/group #--------------------------
wwsh file import /etc/shadow #--------------------------

export WW_CONF=/etc/warewulf/bootstrap.conf
echo "drivers += updates/kernel/" >> $WW_CONF
wwbootstrap `uname -r` #制作bootstrap
echo "GATEWAYDEV=eth0" > /tmp/network.$$
wwsh -y file import /tmp/network.$$ --name network
wwsh -y file set network --path /etc/sysconfig/network --mode=0644 --uid=0
wwvnfs -y --chroot $CHROOT #将做好的node启动目录制作成vnfs镜像文件
#配置每个node的IP等
wwsh -y node new node5 --ipaddr=172.16.10.15 --hwaddr=0c:c4:7a:85:18:da --network=255.255.255.0 --gateway=172.16.10.1 -D eth0
wwsh -y node new node7 --ipaddr=172.16.10.17 --hwaddr=0c:c4:7a:82:c5:d8 --network=255.255.255.0 --gateway=172.16.10.1 -D eth0
#----------wwsh -y node delete node7 写错了可以删除,重新添加
#系统会根据这些配置,生成一个dhcp.conf文件 在pxe启动时分配IP等信息

#下面这个是系统启动后,要为每个node配置修改哪些file
wwsh -y provision set "node5" --vnfs=compute_node_v0.2.0 --bootstrap=`uname -r` --files=dynamic_hosts,passwd,group,shadow,network #--------------------------
wwsh -y provision set "node7" --vnfs=compute_node_v0.2.0 --bootstrap=`uname -r` --files=dynamic_hosts,passwd,group,shadow,network #--------------------------

3.启动服务

1
2
3
4
5
systemctl restart dhcpd
wwsh pxe update
systemctl enable slurmctld.service
systemctl start slurmctld.service
systemctl status slurmctld.service

4.重启node节点

这里我直接操作ipmi重启

1
2
3
4
5
ipmitool -I lanplus -H 172.16.10.107 -U ADMIN -P ADMIN chassis bootdev pxe options=persistent
ipmitool -I lanplus -H 172.16.10.107 -U ADMIN -P ADMIN chassis power reset
ipmitool -I lanplus -H 172.16.10.105 -U ADMIN -P ADMIN chassis bootdev pxe options=persistent
ipmitool -I lanplus -H 172.16.10.105 -U ADMIN -P ADMIN chassis power reset

启动后简单的检测

这里我直接使用slurm命令检测

1
2
3
4
5
6
7
8
9
10
11
12
13
sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
test* up infinite 1 down* node5

scontrol
scontrol: update nodename=node5 state=resume
scontrol: exit

sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
test* up infinite 1 idle node5

#slurm

5.不同node节点配置

1
2
3
4
5
6
7
8
9
10
11
wwvnfs -y --chroot /atlas/os_images/compute_node_v0.2.2 -o /atlas/os_images/vnfs/compute_node_v0.2.2.vnfs

wwsh vnfs import /atlas/os_images/vnfs/compute_node_v0.2.2.vnfs --name=compute_node_v0.2.2 --chroot=/atlas/os_images/compute_node_v0.2.2 -y

wwsh provision set node1 -V compute_node_v0.2.2 -y
wwsh provision set node2 -V compute_node_v0.2.2 -y
wwsh provision set node3 -V compute_node_v0.2.2 -y
wwsh provision set node4 -V compute_node_v0.2.2 -y
wwsh provision set node5 -V compute_node_v0.2.2 -y
wwsh provision set node6 -V compute_node_v0.2.2 -y
wwsh provision set node7 -V compute_node_v0.2.2 -y

wwsh是warewulf的命令,可以直接输入wwsh进入交互.功能主要是,设置node节点启动的时候,pxe的启动设置,网络以及主机名,还有每个节点应该有哪些配置文件,等等.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#wwsh 
Warewulf> help
Warewulf command line shell interface

Welcome to the Warewulf shell interface. This application allows you
to interact with the Warewulf backend database and modules via a
single interface.

bootstrap Manage your bootstrap images
dhcp Manage DHCP service and configuration
events Control how events are handled
exit Exit/leave the Warewulf shell
file Manage files within the Warewulf data store
node Node manipulation commands
object Generically manipulate all Warewulf data store entries
output Set the output verbosity level
provision Node provision manipulation commands
pxe Manage PXE configuration
quit Exit/leave the Warewulf shell
ssh Spawn parallel ssh connections to nodes.
vnfs Manage your VNFS images

Warewulf>