LoadShare - MRC CBU Imaging Wiki

Revision 1 as of 2007-10-22 20:33:30

Clear message
location: LoadShare

Load sharing tool for SPM

On our site, we have a number of central Linux servers that are accessed by VNC.

This is a tool I set up for launching SPM or our MEG software on the currently least loaded machine, so balancing the load across them. The SPM jobs are started either manually by the parallel scheduling part of AutomaticAnalysis.

In the past, we had a script that would scan each machine each time an SPM job was launched. However, we know have 36 machines, and to scan them all takes an unpleasant amount of time. Also, the scanning job would hang if any of the machines did. The new version runs a cronjob every 5 mins, and checks the load on all of the machines, and stores it in a text file. In fact, for robustness, the cronjob runs on every machine, but the script just exits if it is less than 5 mins since the last one. Even more robustly, when the script runs, it inserts the cronjob onto each of the machines, so provided at least one machine keeps running, the system will revive.

It also has a feature for controlling Matlab license usage. When multiple Matlab jobs are launched by the same user on the same machine, only one license is used. The load sharing tool allows each user a fixed number of matlab jobs (stored in loadsharesettings.py). Until this number, each new job will be allocated on the least loaded machine in the whole pool. Once they reach this limit, it will launch them on the least loaded machine out of the ones they already have jobs on.

All of the settings are in "loadsharesettings.py", such as the list of all machines to be scanned for load, and then the lists of machines that are available for various kinds of software.

checkmachineload.py

Checks the load on a single machine

crontab.txt

A sample crontab (insert with crontab crontab.txt)

known_hosts

A list of host keys

launchspm_inner

Inner script for SPM launcher

launchspm_inner_unlimit

Special version of inner script

|| launchspm.py || Python SPM launcher, which is launchspm.py.old loadsharesettings.py loadsharesettings.pyc meg_runcommand.py scanmachines.py scanmachines.pyc status test.py update_known_hosts.py update_known_hosts.pyc