One drawback of using the vanilla universe in Condor (relative to the standard universe) is that unless one uses a shared file system, the former precludes any visibility of files being generated by a running job until that job's completion, whence they are shipped back. Another disadvantage is that one must ship over to the execute node all files that the job might use, which can be a pain if these files are part of a nested directory structure that one would simply like to export a la NFS. However, NFS requires superuser intervention to set up, as does sshfs. As a user operating in a grid that spans various administrative domains, what one would really like is to achieve this functionality via a completely unprivileged route.
Chirp is a remote I/O protocol used by a variety of research projects related to Condor. Condor uses Chirp to allow batch jobs to access their home storage while executing on a remote machine. On the face of it this would appear to be all that's needed for our purposes, and Chirp on its own may suffice for many such situations, in which case look here for further details. However, this requires rather low-level calls for accessing files, whereas ideally we'd like to have a transparent way of doing this so that applications do not need to be modified to issue Chirp-specific calls. Enter...
Parrot has its genesis in the Condor project, but is now maintained and developed by Doug Thain's group at Notre Dame as part of the CCtools project. The particular elegance of this approach is that allows us to start a shell under Parrot such that we can mount any number of remote file systems locally as an unprivileged user. Parrot speaks a number of protocols, including http, httpfs, gsiftp, anonftp, ftp and chirp. It also supports a number of authentication methods, including GSI, Kerberos, hostname and IP-address. This implies a large number of possible protocol/authentication combinations, but here we'll just describe one example, namely Chirp/hostname (the latter uses reverse DNS checking). Note: you can find another Parrot tutorial for use on CamGrid on the Biological Sciences' web server, as well as a way of using Parrot to checkpoint vanilla universe jobs here.
Suppose I want to export the directory /home/mc321/data on woolly--csi.grid.private.cam.ac.uk (172.24.89.129) and want to give read-only access to all machines in the domains grid.private.cam.ac.uk and hep.phy.cam.ac.uk. I start by creating the file .__acl in that directory and adding the contents:
hostname:*.grid.private.cam.ac.uk rl hostname:*.hep.phy.cam.ac.uk rl
You are urged to read the user manual for details of the syntax. Next I'll start a chirp server that exports this directory:
woolly% chirp_server -u - -r /home/mc321/data -I 172.24.89.129 -p 9096 &
A couple of points: note that I've specifically exported woolly's RFC 1918, or "CamGrid", address: the default behaviour of chirp_server is to bind to all interfaces, which is not what I want. Though it may be tempting to use the FQDN rather than the IP address here, the "-I" option only observes the latter. Also, I've asked it to bind to port 9096 (the default is 9094). I next want to submit a Condor job that simply echoes the contents of all files in this exported directory, hence proving that the files can be accessed remotely. The actual executable will be called list.sh, and I'll mount the remote directory on the execute node as /Data. So here's list.sh:
#!/bin/bash mount=/Data for i in `ls $mount` ; do file=$mount/$i if [ -f $file ]; then echo " Contents of $file :" /bin/cat $file fi done exit 0
I next need to wrap list.sh in a wrapper which I then pass to Condor to run for me, which will invoke all the Parrot functionality. Here's the wrapper, which I'll call wrapper.sh:
#!/bin/bash export PATH=.:/bin:/usr/bin export LD_LIBRARY_PATH=. # Use current scratch space for Parrot. The default is to # use /tmp, which may be too small or full. export PARROT_TEMP_DIR=`pwd`/.parrot_tmp mkdir $PARROT_TEMP_DIR # Read in file with mount points mountfile=Mountfile # What's the "real" executable called? my_executable=list.sh chmod +x $my_executable parrot_run # Run the executable under parrot. NOTE: parrot_run used to be called simply parrot. parrot_run -k -Q -H -t $PARROT_TEMP_DIR -m $mountfile $my_executable # Clean up rm -r $PARROT_TEMP_DIR exit
Some comments on wrapper.sh: the file Mountfile contains a list of mappings of remote file systems to local mountpoints in the format "< mount point > /< protocol >/< remote server >[:< port >]". In this case it has the single entry:
The "chmod" directive lists the files that I need to ship over to make this work. You can get the source for CCtools here. Of course, you'll need to send over the correct binaries for the execute machine, e.g. if the execute node is running an x86_64 kernel then make sure you've compiled and sent over suitable 64-bit binaries. Finally, I can list the submit script I pass to Condor via condor_submit:
Universe = vanilla Executable = wrapper.sh Transfer_input_files = list.sh, Mountfile, parrot_run should_transfer_files = YES when_to_transfer_output = ON_EXIT_OR_EVICT Requirements = OpSys == "LINUX" && Arch == "X86_64" log = my.log output = my.output error = my.error Queue
That's it. This may seem a bit complicated at first glance but it does mean that we don't have to depend on having Parrot installed on the execute node, meaning that we can access the data on our local server(s) from anywhere within the grid.
If you need to access a large number of files "simultaneously", e.g. you may have many Condor jobs on the go trying to read and write files from/to your file store, then a single server may not be able to handle the load. In that case, consider distributing your files over a number of chirp servers and clustering them together as described in this HowTo.