skip to primary navigationskip to content
 

Implementation and configuration

The implementation

Note: This page will be of interest mostly to system administrators.

In CamGrid, each participating machine is given an IP address that is routable only across the Cambridge University Data Network (CUDN), which are assigned to CamGrid members by the UCS. A machine may have other IP addresses, or just a CUDN-only routeable address, but Condor is forced to use the latter during operation by setting the appropriate value in the condor_config file, namely NETWORK_INTERFACE. Each host is also given a new entry in a domain grid.private.cam.ac.uk, which the university DN servers recognise. Host names in this new domain take the form < conventional hostname > -- < department > .grid.private.cam.ac.uk. Currently the grid.private.cam.ac.uk domain consists of the following subnets. Individual pools then join the grid by getting their submit machines to flock to all other central managers, while at the same time getting their central manager to allow submit nodes from other pools to flock back. If you're interested in getting your department/group to join CamGrid, then get in touch with me via email (see bottom of the page).

Machine configuration for CamGrid

Consider a machine, (the author's desktop in fact) woolly.csi.cam.ac.uk. This has a globally facing IP address of 131.111.10.94, but in order to work on CamGrid it will need to be assigned an address in the relevant subnet from the grid.private.cam.ac.uk domain, and in this case the address has been chosen to be 172.24.89.129. This is a Linux box running Debian stable, with only one physical NIC, so first we give it a virtual interface in the designated IP range by adding to the file /etc/network/interfaces the following:

auto eth0:1
iface eth0:1 inet static
	address 172.24.89.129
	netmask 255.255.255.192
	network 172.24.89.128
	broadcast 172.24.89.191
        gateway 172.24.89.190
	dns-nameservers 131.111.8.42 131.111.12.20
	dns-search grid.private.cam.ac.uk

This interface can then be started by issuing "ifup eth0:1". The relevant, CamGrid specific, additions to its condor_config.local file are the following (including the definition of an optional port range):

FULL_HOSTNAME      = woolly--csi.grid.private.cam.ac.uk
NETWORK_INTERFACE  = 172.24.89.129
FLOCK_FROM         = *.grid.private.cam.ac.uk # goes in the global condor_config file
FLOCK_TO           = < comma separated list of other central managers (see a list here); goes on your submit nodes >
ALLOW_WRITE        = 172.24.89.128/26
ALLOW_READ         = *.grid.private.cam.ac.uk
HIGHPORT           = 9700
LOWPORT            = 9600

Note that if you're running an iptables based packet filter on any of the nodes, such as pfilter, then you'll have to add extra rules by hand since iptables (and netfilter in general) does not support virtual interfaces. Furthermore, if you do choose to define a port range using LOWPORT and HIGHPORT then keep in mind that Condor uses ~3 ports per running vanilla universe job and up to 5 for a standard universe job, so a dedicated submit host will need a correspondingly large range to keep communicating with all its running jobs.

Condor has adopted an annoying change in default behaviour in going from version 7.0 -> 7.2 in that the daemons now bind to all network interfaces, which will break flocking in CamGrid. If your machines have more than one interface then please add the following setting to the condor_config file:

BIND_ALL_INTERFACES = FALSE

The various Condor daemons are pretty sensitive to the times that they report to eachother, so please ensure that NTP is set up on all machines, but especially on a central manager. You can synch to the university servers [ntp1|ntp2|ntp3].cam.ac.uk.

Log files and scratch spaces

It is strongly recommended that you ensure that the directories that Condor uses to store log files, scratch spaces and spooled files (for submit hosts) are located on a machine's local disk, i.e. not remotely via NFS or similar. These directories are identified in the condor_config file via the entries LOG, EXECUTE and SPOOL. Furthermore, put a busy submit host's SPOOL directory on a fast disk with little else using it, and if you're running a lot of big standard universe jobs then consider setting up multiple checkpoint servers, rather than doing all checkpointing onto the submit node.

Allowing ssh access to running jobs

A bugbear of running outside of the standard universe in any environment without a shared filesystem (like CamGrid) is the lack of visibility of the files being created in the scratch space by a job on an execute host. We have our own, web based, solution on CamGrid (described here), but Condor also allows users to ssh directly to the scratch space of a running job. This is described here, but basically means that if a job is running with ID job_id then the user can perform the following from the submit host to ssh to it:

condor_ssh_to_job <job_id>

For this to work, the execute hosts need to allow such access. It's up to each pool to decide whether they want to allow this, but doing so requires the following on the execute hosts:

  • A proper shell for the account running the jobs, e.g. /bin/bash and not /bin/false.
  • The configurational value: ALLOW_DAEMON = */*.grid.private.cam.ac.uk (or specific subnet additions to ALLOW_WRITE; not recommended).
  • A Bourne shell compliant default shell for /bin/sh. If your machines swap this for /bin/dash (default these days for Debian and Ubuntu) then change the she-bang lines of the scripts $(LIBEXEC)/condor_ssh* to #!/bin/bash.

Required configuration modifications

Please ensure that your pools conform to the standard expected of CamGrid, details of which can be found here.

Recommended configuration modifications

You may also want to add the following lines to the condor_config on your central manager. These disable the periodic attempt to send udp packets to the Condor developers, informing them of the state of your pool. Adding these lines also has the side effect of improving response times for certain tools, e.g. condor_status.

CONDOR_DEVELOPERS           = NONE
CONDOR_DEVELOPERS_COLLECTOR = NONE

Under *nix, Condor will, by default, run as user "nobody". This can be a security concern for vanilla jobs since any forked processes may linger after the parent Condor job has terminiated. If this is not acceptable from the point of view of the local security model, then one can make Condor run as a dedicated user. This has the benefit that Condor should ensure that no forked processes survive the parent process when that exits. For example, suppose we've created the user accounts condor_user[0..N] on an execute node. Then, in order to make jobs run as one of these users we'd add the following lines to the condor_config(.local) file:

SLOT1_USER                         = condor_user0
SLOT2_USER                         = condor_user1
...
SLOTN_USER                         = condor_userN
STARTER_ALLOW_RUNAS_OWNER          = False
DEDICATED_EXECUTE_ACCOUNT_REGEXP   = condor_user[0-9]+

Hence, there'll be one such dedicated user per slot on your host. The case for a host with a single slot could simply be:

SLOT1_USER                         = condor_user
STARTER_ALLOW_RUNAS_OWNER          = False
DEDICATED_EXECUTE_ACCOUNT_REGEXP   = condor_user

The above will force non-Standard Universe jobs to run with those dedicated accounts, but has not effect for SU jobs. The best we can do is to force them to run as the *nix user "nobody" by setting:

TRUST_UID_DOMAIN = FALSE
SOFT_UID_DOMAIN  = FALSE

One can decide what nice value a condor job should run as by setting the appropriate value from within the condor_config file, and this follows the *nix "nice" regime. Hence, in order for a job to run as weightless, one would set:

JOB_RENICE_INCREMENT       = 19

Condor advertises a standard set of resource properties through its classad mechanism, which is useful when attempting to match submitted jobs with execute nodes. However, machines in CamGrid use this mechanism to also advertise properties which are relevant locally. Find these properties here and please incorporate into your pool(s).

The flocking mechanism used within CamGrid can take a while to match jobs across pools. In order to speed up this process, you may choose to modify the interval that the negotiator spends between negotiation cycles from the default five minutes by putting a suitable entry in the condor_config file, e.g. the following sets it to sixty seconds:

NEGOTIATOR_INTERVAL     = 60

In order for us to acquire usage statistics within CamGrid, add the following line to the config file where your pool's collector is running, which is probably on the central manager.

CONDOR_VIEW_HOST        = winnie--csx.grid.private.cam.ac.uk

This used to be an issue with older kernels, but not so much these days. Anyway, in order to stop some firewalls from dropping idle connections (especially for vanilla jobs), it would be helpful if the TCP keep-alive time is set to less than an hour. For example, to set this to 12 minutes on a Linux box, perform as root:

echo 720 > /proc/sys/net/ipv4/tcp_keepalive_time

Finally, a note on screensavers: please disable screensavers on execute nodes. At the very least, just set them as blank so that they don't use up precious CPU cycles. 

Limiting Resource Usage

It often comes as a rude surprise to administrators that Condor does not enforce the quantities that get advertised by a machine's classad. So for example, you may hardcode a memory of X bytes to be advertised for a particular slot, but if the job that's been allocated to it grows to grab more memory than X, then Condor won't stop it from doing so. This is also true for other important resources, e.g. disk space, which can have serious repercussions for the health of a machine. One way to mitigate this is to run Condor in a virtual machine, but this can be deemed unnecessary overkill. Instead, Condor has two ways of trying to address the issue.

The older way is by providing the facility for a wrapper script that is executed prior to the main executable, and whose job it is to set specific limits for that job via the "ulimit" command. This script is nominated via the USER_JOB_WRAPPER directive, and an example of such a script is given here, though this comes with a warning: the example script in the Condor documentation has a she-bang line of #!/bin/sh, but it's really a Bash and not a Bourne shell script. This will cause you problems if your linux distro silently replaces Bourne by something like Dash. Hence, replace the she-bang line with #!/bin/bash to be on the safe side. Since ulimit is known not to be foolproof, I also recommend adding the following line just before the wrapper script's exec command:

echo 15 > /proc/$$/oom_adj

This will nominate the Condor job to be preferentially killed by the out-of-memory (OOM) killer, which is preferable to the machine freezing up.

The newer way is for Linux kernels that support Control Groups (a.k.a. cgroups). I describe that method here.

Condor's default static slot configuration (as opposed to dynamic slots) assigns one processing core per logical job slot. This is fine for serial jobs, but multi-threaded jobs could end up abusing this set up, and spill over to use the cores of other slots. In order to force jobs to adhere to their assigned resource then sysadmins using static slots can set the following configuration value (see the manual entry here ):

ENFORCE_CPU_AFFINITY       = TRUE

Such affinity can also be forced on a per-slot basis, as described in a post to the global Condor mailing list here. Note that this functionality was broken for HTCondor versions prior to 8.0.

Example configuration entries

Condor's default configuration files come with some intricate settings with respect to how jobs start and are preempted. They are also very aggressive with respect to pre-empting jobs for those with better priority, which can result in wasted CPU cycles for jobs that do not save state. If all you want is for jobs to always run, then consider setting the JOB_RENICE_INCREMENT value mentioned above and the following values, which should work well as long as no swapping takes place:

START = True
SUSPEND = False
PREEMPT = False
KILL = False
RANK =
CONTINUE = True

If we really want to disable all preemption, then we could go for the following:

# Disable preemption by machine activity, as above.
PREEMPT = False
# Disable preemption by user priority.
PREEMPTION_REQUIREMENTS = False
# Disable preemption by machine RANK by ranking all jobs equally.
RANK = 0
# Since we are disabling claim preemption, we may as well optimize negotiation for this case:
NEGOTIATOR_CONSIDER_PREEMPTION = False
# After 20 minutes, schedd must renegotiate to run additional jobs on the machine, 
# otherwise this user could hang on to this machine for ever.
CLAIM_WORKLIFE = 1200

 

Giving priority to local users

As an administrator, you may want to give priority on your machines to users from your group, which is only fair. For example, the following will give priority on vacant machines to users of the "Astrology" group, but won't kick off any existing jobs (whoever they belong to).

# In the submit host's configuration:
Group = "Astrology"
SUBMIT_ATTRS = $(SUBMIT_ATTRS), Group 

# In the central manager's configuration (note that here TARGET refers to job classad, 
# whereas MY would refer to the machine's classad):
NEGOTIATOR_PRE_JOB_RANK = 10 * (TARGET.Group =?= "Astrology") + 1 * (RemoteOwner =?= UNDEFINED) 

Note that this approach isn't foolproof, and can be spoofed by non-Astrology users. However, such transgressions can easily be spotted from those jobs' classads and banned. For example, from the 7.4 series onwards each job creates .job.ad and .machine.ad files in each job's scratch directory, where the former has all the job classad entries and the latter has all the machine classad entries. Such spoofing can be deduced by looking in the former for the "Group" entry (if it exists). In fact, the condor_starter daemon puts the location of these classad files in the environment variables _CONDOR_JOB_AD and _CONDOR_MACHINE_AD, but strangely these files are not present for Standard universe jobs. In that case, one needs to interrogate the Collector for the required information (see my example for setting CPU affinity for dynamic slots here to see how to do this).