It's quite common for groups to have PBS facilities for running their tightly coupled MPI jobs, and since v6.8 Condor could talk directly to PBS. There are a number of caveats, the main one being that the PBS queue has to be on the same machine as the Condor schedd submitting the job to it. However, Condor also provides the Condor-C mechanism of delegated job submission, so the road is open for a schedd on machine A to request a schedd on machine B to submit a job to the PBS queue on B. Note that in what follows I describe how to submit jobs from a Condor queue to a dedicated PBS cluster. The processors running under PBS are not directly part of a Condor pool. If you're interested in having nodes belonging simultaneously to a Condor pool and a PBS cluster then you may consider the scavenging model described here.
MPI without PBS
PBS without MPI
We'll start by looking at getting PBS to work without MPI, i.e. just simple "fork" type jobs like one would run on an LCG farm. First we'll need to get Condor configured on the PBS head node. This needs to have the right security settings set up between the the two schedds, and a simple insecure method for testing would set:
SEC_DEFAULT_NEGOTIATION = OPTIONAL SEC_DEFAULT_AUTHENTICATION = OPTIONAL SEC_DEFAULT_NEGOTIATION_METHODS = CLAIMTOBE SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE
Corresponding (and also insecure) settings on the delegating (i.e. client) node would be:
SEC_DEFAULT_AUTHENTICATION = OPTIONAL SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE
We'll also need to configure $CONDOR_HOME/lib/glite/etc/batch_gahp.config to point at the PBS installation. Now suppose I want to submit an executable called "Test.sh" to a PBS headnode called iguana.my.domain, which is in a Condor pool managed by the machine donkey.my.domain (or at least that's where that pool's Collector resides). Then the Condor job file that I submit will look like:
universe = grid executable = Test.sh output = myoutput error = myerror log = mylog grid_resource = condor iguana.my.domain donkey.my.domain +remote_universe = grid +remote_grid_resource = pbs queue
We launch this with the usual condor_submit directive.
PBS with MPI
We now want to run a similar job to the above but this time I'm submitting an MPI-enabled executable. The problem is that the Condor->PBS interface does not naturally allow for this, unlike the Globus or NorduGrid interfaces, by which I mean that there is no RSL option to specify PBS-specific options (e.g. number of processors, etc.). To circumvent this limitation I have to wrap the actual command I want to run in a wrapper script and pass that to the appropriate MPI execution command, so mpirun for MPICH, etc.. In this example I will run the parallel executable "cpi", which is actually built as part of the MPICH suite in the < MPICH install dir >/examples directory. I'm going to send the job to iguana.my.domain, which runs a Condor schedd and is in the pool managed by donkey.my.domain. So iguana also runs the PBS queue, whereas donkey is oblivious of all things PBS. First the submit script:
universe = grid executable = wrapper transfer_input_files = cpi WhenToTransferOutput = ON_EXIT output = myoutput error = myerror log = mylog grid_resource = condor iguana.my.domain donkey.my.domain +remote_universe = grid +remote_grid_resource = pbs +remote_requirements = True +remote_ShouldTransferFiles = "YES" +remote_WhenToTransferOutput = "ON_EXIT" queue
Now for the wrapper script we pass to be run on the PBS cluster. I'm going to ask for two processors, so:
#!/bin/sh chmod +x cpi /path/to/mpich/installation/bin/mpirun -np 2 cpi
Job completion times
The time for job completion can actually be quite slow, even if the PBS job itself executes quickly. This is mainly due to Condor's polling mechanism for determining the progress of each step, which by default is set to five minutes. If this is too slow for you then look to modify the value of the setting CONDOR_JOB_POLL_INTERVAL.