2.3. Running Linpack

2.3.1. Interactive Mode

This section describes ways to scale up a HPL job on a Rocks cluster.

To get started, you can follow the instructions on how run a two-processor HPL job at Using Mpirun on Ethernet. Then, to scale up the number of processors, add more entries to your machines file. For example, to run a 4-processor job over compute nodes compute-0-0 and compute-0-1, put the following in your machines file:

compute-0-0
compute-0-0
compute-0-1
compute-0-1

Then you'll need to adjust the number of processors in HPL.dat:

change:

1 Ps
2 Qs

to:

2 Ps
2 Qs

Note

The number of total processors HPL uses is computed by multiplying P times Q. That is, for a 16-processor job, you could specify:
4 Ps
4 Qs

And finally, you need adjust the np argument on the mpirun command line:

$ /opt/mpich/gnu/bin/mpirun -nolocal -np 4 -machinefile machines /opt/hpl/gnu/bin/xhpl

To make the job run longer, you need to increase the problem size. This is done by increasing the Ns parameter. For example, to quadruple the amount of work each node performs:

change:

1000 Ns

to:

2000 Ns

Note

Keep in mind, doubling the Ns parameter quadruples the amount of work.

Tip

For more information on the parameters in HPL.dat, see HPL Tuning.

2.3.2. Batch Mode

This section describes ways to scale up a HPL job on a Rocks cluster when submitting jobs through Grid Engine.

To get started, you can follow the instructions on how to scale up a HPL job at Interactive Mode. To increase the number of processors that the job uses, adjust HPL.dat (as described in Interactive Mode). Then get the file sge-qsub-test.sh (as described in launching batch jobs), and adjust the following parameter:

#$ -pe mpi 2

For example, if you want a 4-processor job, change the above line to:

#$ -pe mpi 4

Then submit your (bigger!) job to Grid Engine.

Warning

If you see in a error message in your output file that looks like:

p2_25612: p4_error: interrupt SIGSEGV: 11
p4_22913: p4_error: interrupt SIGSEGV: 11
Broken pipe
Broken pipe

or:

p2_25887: (6.780981) xx_shmalloc: returning NULL; requested 13914960 bytes
p2_25887: (6.781052) p4_shmalloc returning NULL; request = 13914960 bytes
You can increase the amount of memory by setting the environment variable
P4_GLOBMEMSIZE (in bytes);

Then you'll need to increase the size of P4_GLOBMEMSIZE. To do that, edit sge-qsub-test.sh and increase value in the line:

#$ -v P4_GLOBMEMSIZE=10000000 

Then resubmit the job to SGE.