Rocks Cluster Distribution: Users Guide:
Prev	Chapter 2. Start Computing	Next

2.3. Running Linpack

2.3.1. Interactive Mode

This section describes ways to scale up a HPL job on a Rocks cluster.

To get started, you can follow the instructions on how run a two-processor HPL job at Using Mpirun on Ethernet. Then, to scale up the number of processors, add more entries to your machines file. For example, to run a 4-processor job over compute nodes compute-0-0 and compute-0-1, put the following in your machines file:

compute-0-0
compute-0-0
compute-0-1
compute-0-1

Then you'll need to adjust the number of processors in HPL.dat:

change:

1 Ps
2 Qs

to:

2 Ps
2 Qs

The number of total processors HPL uses is computed by multiplying P times Q. That is, for a 16-processor job, you could specify:
4 Ps 4 Qs

And finally, you need adjust the np argument on the mpirun command line:

$ /opt/mpich/gnu/bin/mpirun -nolocal -np 4 -machinefile machines /opt/hpl/gnu/bin/xhpl

To make the job run longer, you need to increase the problem size. This is done by increasing the Ns parameter. For example, to quadruple the amount of work each node performs:

change:

1000 Ns

to:

2000 Ns

Keep in mind, doubling the Ns parameter quadruples the amount of work.

For more information on the parameters in HPL.dat, see HPL Tuning.

2.3.2. Batch Mode

This section describes ways to scale up a HPL job on a Rocks cluster when submitting jobs through Grid Engine.

To get started, you can follow the instructions on how to scale up a HPL job at Interactive Mode. To increase the number of processors that the job uses, adjust HPL.dat (as described in Interactive Mode). Then get the file sge-qsub-test.sh (as described in launching batch jobs), and adjust the following parameter:

#$ -pe mpi 2

For example, if you want a 4-processor job, change the above line to:

#$ -pe mpi 4

Then submit your (bigger!) job to Grid Engine.

If you see in a error message in your output file that looks like:

p2_25612: p4_error: interrupt SIGSEGV: 11
p4_22913: p4_error: interrupt SIGSEGV: 11
Broken pipe
Broken pipe

or:

p2_25887: (6.780981) xx_shmalloc: returning NULL; requested 13914960 bytes
p2_25887: (6.781052) p4_shmalloc returning NULL; request = 13914960 bytes
You can increase the amount of memory by setting the environment variable
P4_GLOBMEMSIZE (in bytes);

Then you'll need to increase the size of P4_GLOBMEMSIZE. To do that, edit sge-qsub-test.sh and increase value in the line:

#$ -v P4_GLOBMEMSIZE=10000000

Then resubmit the job to SGE.

Prev	Home	Next
Launching Batch Jobs Using Grid Engine	Up	Monitoring