LoadShare - MRC CBU Imaging Wiki
location: LoadShare

Load sharing tool for SPM

Overview of commands

showload

Shows load on all machines

spm

See separate help on SpmVersions

bestmegmachine

Shows best machine for launching MEG software

all Elekta/MNE/Freesurfer commands

Launched with loadsharing using runmne wrapper

Technical details

On our site, we have a number of central Linux servers that are accessed by VNC. This is a tool I set up for launching SPM or our MEG software on the currently least loaded machine, so balancing the load across them. The SPM jobs are started either manually by the parallel scheduling part of AutomaticAnalysis. In the past, we had a script that would scan all of the machines each time an SPM job was launched. However, we now have 36 machines, and to scan them all takes an unpleasant amount of time. Also, the scanning job would hang if any of the machines did. The new version runs a cronjob every 5 mins, and launches a small job on each machine that checks its load and writes the result to a file. In fact, for robustness, the cronjob runs on every machine, but the script just exits if it is less than 30 secs since the last one. Even more robustly, when the script runs, it inserts the cronjob onto each of the machines, so provided at least one machine keeps running, the system will revive.

It also has a feature for controlling Matlab license usage. When multiple Matlab jobs are launched by the same user on the same machine, only one license is used. The load sharing tool allows each user a fixed number of matlab jobs (stored in loadsharesettings.maxlicensesperuser). Until this number, each new job will be allocated on the least loaded machine in the whole pool. Once they reach this limit, it will launch them on the least loaded machine out of the ones they already have jobs on.

The scripts are written in python, with a couple of shell scripts that help launch SPM. All of the settings are in "loadsharesettings.py", such as the list of all machines to be scanned for load, and then the lists of machines that are available for various kinds of software.

File listing

General settings

loadsharesettings.py

Various settings for the loadshare system

Checking machine load

crontab.txt

A sample crontab (insert with crontab crontab.txt)

scanmachines.py

Script that goes through all of the machines as listed in loadsharesettings.machines and runs checkmachineload on each. Exits if run more recently than loadsharesettings.minscaninterval (default 30 s)

checkmachineload.py

Checks the load on a single machine

status

Directory containing files recording: load and last type scanmachines launched from each of the machines

Launching software

loadshare.py

Main function: getbestmachine(availablemachinelist, usesmatlab); Parameters: availablemachinelist is list of machine names; usesmatlab is 0 or 1, depending on whether matlab is required for this task (used in machine selection to control license issuing). This is now used by all of the below

launchspm.py

The main python launch script. Accepts various parameters - see SpmVersions. It also accepts some hidden parameters, including "workerdesktop" which launches without the Matlab java desktop and with a funky yellow on black

launchspm_inner

Inner script for SPM launcher

launchspm_inner_unlimit

Special version of inner script

meg_runcommand.py

Wrapper for Elekta Neuromag tools. All are load balanced. One (mce) requires Matlab

run_mne.py

Wrapper for MNE or freesurfer

Updating known hosts

update_known_hosts.py

Updates list of .ssh keys using list in file known_hosts

known_hosts

A list of host keys. Can be a copy of the file in your ~/.ssh folder

Download

Download from here http://www.mrc-cbu.cam.ac.uk/~rhodri/loadshare.tar

CbuImaging: LoadShare (last edited 2013-03-07 21:23:55 by localhost)