skip to primary navigationskip to content
 

CamGrid's flocked architecture

The picture below gives a simplified depiction of the federated architecture used in CamGrid via Condor's flocking mechanism. Each participating group/department runs its own pool, each with its own central manager, and a list of participating groups/departments can be found here. Jobs submitted by a submit host will always attempt to be serviced within that host's own pool if possible, like the intra-pool case below. If such a match is deemed impossible, and the submit host has been configured to try and contact other central managers, then the job will attempt to flock to another pool. Note that flocking does not work for parallel universe (MPI) jobs, so if you intend using MPI facilities in another pool then you'll have to make specific arrangements with that pool's administrator to schedule your jobs onto it.

architecture

Note that central managers do not need to communicate during flocking: only the submit host and the remote central manager need to initiate the negotiation. Once a match is found then even the remote central manager drops out of the picture, and the submit and execute hosts communicate directly for the duration of the job. This implies that every submit node may potentially need to communicate with every execute node, so firewalls need to be adjusted accordingly.

Finally, machines on CamGrid operate Condor over CUDN-only routeable IP addresses. So although some machines may also be configured with global IP addresses, none of the Condor daemons attach to these, thus enhancing local security.