2018-03-10

[TOC]

官方介绍

Horovod is a distributed training framework for TensorFlow, Keras, and PyTorch. The goal of Horovod is to make distributed Deep Learning fast and easy to use.

官方测试效果

training

Running Horovod

The example commands below show how to run distributed training. See the Running Horovod page for more instructions, including RoCE/InfiniBand tweaks and tips for dealing with hangs.

1. 单机4卡:

# docker
nvidia-docker run -it 172.16.10.10:5000/horovod:0.12.1-tf1.8.0-py3.5 
mpirun -np 4 -H localhost:4 python keras_mnist_advanced.py

# singularity
singularity shell --nv /scratch/containers/ubuntu.simg
mpirun -np 4 -H localhost:4 python keras_mnist_advanced.py

2. 多机多卡:

$ mpirun -np 16 \
    -H server1:4,server2:4,server3:4,server4:4 \
    ...
    python train.py

3. 完整 Docker 使用horovod

4. horovod 完整使用GPU

1. Install NCCL 2.

NCCL 理解

# software requirements:
glibc 2.19 or higher
CUDA 8.0 or higher
CUDA devices with a compute capability of 3.0 and higher.

ubuntu install nccl 2

1
2
3

dpkg -i nccl-repo-ubuntu1604-2.1.15-ga-cuda9.1_1-1_amd64.deb #需要登录nvidia申请下载
apt update 
apt install libnccl2 libnccl-dev

2. Install Open MPI or another MPI implementation.

install openmpi
tar xf openmpi-3.1.1.tar.bz2 
cd openmpi-3.1.1/
./configure --with-cuda
make -j 12
make install 
apt install libopenmpi1.10
mpirun --version

3. Install the `horovod` pip package.

1
2
3

$ HOROVOD_GPU_ALLREDUCE=NCCL pip install --no-cache-dir horovod #ubuntu have installed nccl2
$ HOROVOD_GPU_ALLREDUCE=MPI pip install --no-cache-dir horovod  # use mpi instead nccl2 in allreduce 
$ HOROVOD_GPU_ALLREDUCE=MPI HOROVOD_GPU_ALLGATHER=MPI HOROVOD_GPU_BROADCAST=MPI pip install --no-cache-dir horovod # use mpi instead nccl2

hooby's blog

纸上得来终觉浅，绝知此事要躬行!

horovod

官方介绍

Running Horovod

1. 单机4卡:

2. 多机多卡:

3. 完整 Docker 使用horovod

4. horovod 完整使用GPU

1. Install NCCL 2.

2. Install Open MPI or another MPI implementation.

3. Install the `horovod` pip package.

官方介绍

Running Horovod

1. 单机4卡:

2. 多机多卡:

3. 完整 Docker 使用horovod

4. horovod 完整使用GPU

1. Install NCCL 2.

2. Install Open MPI or another MPI implementation.

3. Install the horovod pip package.

3. Install the `horovod` pip package.