Building and Installing PBS 19 and OpenMPI: Tutorial
Notes
- Building the PTL RPM (for testing) is optional
- You may want to use the 18.1.3 release rather than the 19.1.1 beta. An issue was found with the beta where the database did not start and will be fixed prior to the official release.
Tutorial
Overview
========
This tutorial will accomplish three things:
1. Build PBS Professional (OSS release 19.1.1 beta 1 for this example)
2. Build OpenMPI with support for PBS Professional task manager interface
3. Build and run some sample MPI applications
OpenMPI will be installed under /opt/openmpi
PBS Pro will be installed under /opt/pbs
PTL will be installed inder /opt/ptl
Prerequisites
=============
- Two VMs with two virtual CPUs each (pbs-server and mom-2 for this example)
- Root access on both VMs (needed for installing PBS Pro and OpenMPI)
- /opt does not squash UID for SUID binaries
- VMs configured to communicate with each other
- Same OS on both VMs (to prevent building everything twice)
- Intenet access to download source code
- Build dependencies for PBS Pro and OpenMPI are installed on primary VM
- Installation dependencies for PBS Pro and OpenMPI are installed on both VMs
Setup
=====
Any existing PBS Pro or OpenMPI packages should be uninstalled.
$ rpm -qa | grep pbs
$ rpm -qa | grep openmpi
Use yum, zypper, apt-get, etc. to uninstall these packages. Check the
contents of /opt to ensure the pbs and openmpi directories are not present.
Also remove /etc/pbs.conf and the PBS_HOME (/var/spool/pbs) direcory.
PBS Pro and OpenMPI distribution packages will be built under ~/work on the
primary VM. The RPMs will be built in the standard rpmbuild location. Note
that these directories may already exist.
$ mkdir ~/work
$ mkdir ~/rpmbuild ~/rpmbuild/BUILD ~/rpmbuild/BUILDROOT ~/rpmbuild/RPMS \
~/rpmbuild/SOURCES ~/rpmbuild/SPECS ~/rpmbuild/SRPMS
Build PBS Professional
======================
$ cd ~/work
$ curl -so - https://codeload.github.com/PBSPro/pbspro/tar.gz/v19.1.1beta1 | \
gzip -cd | tar -xf -
$ cd pbspro-19.1.1beta1
$ ./autogen.sh
[output omitted]
[ Note: You will see several "wildcard" warnings in the output because wildcard directives are used in some of the Makefile.am files. These messages may be ignored. ]
$ ./configure PBS_VERSION='19.1.0' --prefix=/opt/pbs
[output omitted]
$ make dist
[output omitted]
$ cp pbspro-19.1.0.tar.gz ~/rpmbuild/SOURCES
$ cp pbspro.spec ~/rpmbuild/SPECS
$ cd ~/rpmbuild/SPECS
$ rpmbuild -ba --with ptl pbspro.spec
Install PBS Professional
========================
This example is run on CentOS using yum. Adjust accordingly for the OS.
$ cd ~/rpmbuild/RPMS/x86_64
$ sudo yum install pbspro-server-19.1.0-0.x86_64.rpm
[output omitted]
Optionally, install PTL:
$ sudo yum install pbspro-ptl-19.1.0-0.x86_64.rpm
- Set PBS_START_MOM=1 on the primary VM
- Start PBS Pro on the primary VM
- Copy the pbspro-execution RPM to the secondary VM and install it
- Start PBS Pro on the secondary VM
- Use qmgr to add the secondary VM to the complex
- Confirm that the secondary VM is available (pbsnodes -av)
Build OpenMPI
=============
The current release as of January 4, 2019 is OpenMPI 4.0.0
$ cd ~/work
$ curl -sO https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.0.tar.gz
$ tar -xOf openmpi-4.0.0.tar.gz */contrib/dist/linux/openmpi.spec | \
sed 's/$VERSION/4.0.0/' | sed 's/$EXTENSION/gz/' >openmpi.spec
$ cp openmpi.spec ~/rpmbuild/SPECS
$ cp openmpi-4.0.0.tar.gz ~/rpmbuild/SOURCES
$ cd ~/rpmbuild/SPECS
$ rpmbuild -D 'configure_options --without-slurm --with-tm=/opt/pbs' \
-D 'install_in_opt 1' -ba openmpi.spec
[ Note: Versions of PBS Pro prior to 18.x require an environment variable to be set prior to building OpenMPI... LIBS=-ldl ]
Install OpenMPI
===============
This example is run on CentOS using yum. Adjust accordingly for the OS.
$ cd ~/rpmbuild/RPMS/x86_64
$ sudo yum install openmpi-4.0.0-*.rpm
[output omitted]
Add profile scripts for OpenMPI:
$ cd ~/work
$ cat <<'EOF' >openmpi.sh
PATH=${PATH}:/opt/openmpi/4.0.0/bin
MANPATH=${MANPATH}:/opt/openmpi/4.0.0/man
EOF
$ sudo cp openmpi.sh /etc/profile.d/openmpi.sh
$ cat <<'EOF' >openmpi.csh
setenv PATH ${PATH}:/opt/openmpi/4.0.0/bin
setenv MANPATH ${MANPATH}:/opt/openmpi/4.0.0/man
EOF
$ sudo cp openmpi.csh /etc/profile.d/openmpi.csh
Copy the RPM to the secondary VM and install it there as well.
Copy the /etc/profile.d/openmpi.* scripts to the secondary VM.
====================================================================
STOP! STOP! STOP! STOP! STOP! STOP! STOP! STOP! STOP! STOP!
====================================================================
Before you proceed, log out and log back in. This will cause your login shell
to process the new files in /etc/profile.d and setup your PATH and MANPATH
correctly. Once you have logged back in, ensure your PATH and MANPATH contain
references to the appropriate directories. This may include PTL if it was
installed.
As an alternative, you may source the files directly from your login shell
without logging out.
====================================================================
Compile and Run a Job with OpenMPI
==================================
$ cd ~/work
$ cat <<'EOF' >>hello_mpi.c
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <limits.h>
#include <mpi.h>
int main(int argc, char* argv[])
{
int rank, size;
char hostname[HOST_NAME_MAX];
void *appnum;
void *univ_size;
char *appstr, *unistr;
int flag;
char *envar;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_APPNUM, &appnum, &flag);
if (NULL == appnum) {
asprintf(&appstr, "UNDEFINED");
} else {
asprintf(&appstr, "%d", *(int*)appnum);
}
MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &univ_size, &flag);
if (NULL == univ_size) {
asprintf(&unistr, "UNDEFINED");
} else {
asprintf(&unistr, "%d", *(int*)univ_size);
}
gethostname(hostname, sizeof(hostname));
envar = getenv("OMPI_UNIVERSE_SIZE");
printf("Rank:%d/%d Host:%s App#:%s MPI_UNIVERSE_SIZE:%s OMPI_UNIVERSE_SIZE:%s\n",
rank, size, hostname, appstr, unistr, (NULL == envar) ? "NULL" : envar);
MPI_Finalize();
return 0;
}
EOF
$ mpicc -o hello_mpi hello_mpi.c
$ ssh mom-2 mkdir work
$ scp hello_mpi mom-2:work
$ cat <<EOF >mpijob
#PBS -l select=4:ncpus=1:mem=64m
#PBS -j oe
mpirun ~/work/hello_mpi
EOF
$ qsub mpijob
8.pbs-server
$ cat mpijob.o8
Rank:0/4 Host:pbs-server App#:0 MPI_UNIVERSE_SIZE:4 OMPI_UNIVERSE_SIZE:4
Rank:2/4 Host:mom-2 App#:0 MPI_UNIVERSE_SIZE:4 OMPI_UNIVERSE_SIZE:4
Rank:3/4 Host:mom-2 App#:0 MPI_UNIVERSE_SIZE:4 OMPI_UNIVERSE_SIZE:4
Rank:1/4 Host:pbs-server App#:0 MPI_UNIVERSE_SIZE:4 OMPI_UNIVERSE_SIZE:4
Mom logs from pbs-server (where ranks 0 and 1 were run):
=======================================================
01/07/2019 14:21:06;0008;pbs_mom;Job;8.pbs-server;nprocs: 315, cantstat: 0, nomem: 0, skipped: 0, cached: 0
01/07/2019 14:21:06;0008;pbs_mom;Job;8.pbs-server;Started, pid = 120710
01/07/2019 14:21:07;0080;pbs_mom;Job;8.pbs-server;task 00000001 terminated
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;Terminated
01/07/2019 14:21:07;0100;pbs_mom;Job;8.pbs-server;task 00000001 cput= 0:00:00
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;kill_job
01/07/2019 14:21:07;0100;pbs_mom;Job;8.pbs-server;pbs-server cput= 0:00:00 mem=424kb
01/07/2019 14:21:07;0100;pbs_mom;Job;8.pbs-server;mom-2 cput= 0:00:00 mem=0kb
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;no active tasks
01/07/2019 14:21:07;0100;pbs_mom;Job;8.pbs-server;Obit sent
Mom logs from mom-2 (where ranks 2 and 3 were run):
01/07/2019 14:21:06;0008;pbs_mom;Job;8.pbs-server;JOIN_JOB as node 1
01/07/2019 14:21:06;0008;pbs_mom;Job;8.pbs-server;task 20000001 started, orted
01/07/2019 14:21:07;0080;pbs_mom;Job;8.pbs-server;task 20000001 terminated
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;KILL_JOB received
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;kill_job
01/07/2019 14:21:07;0100;pbs_mom;Job;8.pbs-server;task 20000001 cput= 0:00:00
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;DELETE_JOB received
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;kill_job
Notes:
======
- If the user's home directory were shared across both VMs (e.g. via NFS)
there would have been no need to create the work directory or copy the
hello_mpi binary to mom-2.
Testing
test.sh:
#!/bin/bash
#PBS -N pbs-openmpi-sh
#PBS -l select=2:ncpus=2:mpiprocs=2
#PBS -l place=scatter
/opt/openmpi/1.10.7/bin/mpirun -np `cat $PBS_NODEFILE | wc -l` /bin/hostname
qsub pbs-openmpi.sh