Notes for new sysadmins
Condor is a reasonably complex beast to set up, but once it's up and running it generally tends to tick over just fine. A sensible set of steps for new sysadmins is:
- Read the Administrator's chapter from the Condor manual in order to familiarise yourself with the middleware. An excellent how-to list of common administrative tasks can be found here.
- Look at the CamGrid specific pages, especially those describing the architecture and example configuration of machines.
- Request a set of CUDN-only routeable IP addresses from me (mc321 at cam.ac.uk) and collaborate with the UCS to get the routing functioning correctly.
- Get yourself added to the ucam-camgrid-admins mailing list (go here).
- Set up your own isolated Condor pool using the CUDN-only routeable addresses and get experience in managing it.
- There in no shared file system that spans CamGrid, so jobs usually have to depend on HTCondor to perform the file transfer. However, it is possible to have HTCondor use a shared file system for local jobs, i.e. ones which are submitted from and stay in your pool, i.e. do not flock out. These jobs will run with the UID of the users who submitted them, whereas any external jobs flocking in will run as whatever dedicated user accounts you may have set up for the purpose, or Linux user "nobody" if you haven't set them up.
- Gain some experience as a user on your pool. This will help you deal with many of the problems your users will come to you with, especially new users. Sysadmins are expected to attain a level of proficiency with Condor, and be able to solve the majority of problems seen in their pool.
- Once all the above have been met, mail the ucam-camgrid-admins list telling the other sysadmins the name of your central manager so they can flock to it. Similarly, allow your submit node(s) to flock to theirs.