Diff for "AutomaticAnalysisParallel" - MRC CBU Imaging Wiki
location: Diff for "AutomaticAnalysisParallel"
Differences between revisions 1 and 38 (spanning 37 versions)
Revision 1 as of 2007-08-20 11:39:40
Size: 4196
Editor: RhodriCusack
Comment:
Revision 38 as of 2013-03-07 21:23:46
Size: 6235
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
[[Include(AutomaticAnalysisTopbar)]] <<Include(AutomaticAnalysisTopbar)>>
Line 6: Line 6:
[[TableOfContents]] <<TableOfContents>>
Line 9: Line 9:
= Summary = = Parallel processing with aa =
Line 11: Line 11:
aa version 2.0 and above contain the facility to run things in parallel. Only one small change is required to your user script. A single subject analyses should speed up by a factor of 2 or so, and multiple subjects by a factor of 5-10 [benchmarking in progress - improvement factors are dependent on the number of jobs you are allocated, which is determined by the memory, processor and Matlab license load on the Linux system]. aa version 2.0 has the facility to run multiple parts of your analysis at the same time in parallel. It uses coarse grain parallelism: different modules (*) execute simultaneously, but there is no attempt to subdivide single modules. A single subject analysis should speed up by around a factor of 2, and multiple subject analyses by a factor of 5-10 [benchmarking in progress - speed is dependent on the number of jobs you are allocated, which is determined by the memory, processor and Matlab license load on the Linux system].
Line 13: Line 13:
= How to use it =
In your user script, replace the (usually final) line
(*) strictly, different instances of modules, as a single module such as slice timing will be run on different sessions simultaneously

== How to use it ==
=== Starting ===
Only one small change is required to your user script. Replace the (usually final) line
Line 23: Line 26:
= How it works =
== Master and slaves ==
A master Matlab/SPM/aa job - the one you run your user script on - coordinates the processing, but does not actually process anything. A number of slave jobs do the processing.
=== Stopping! ===
Running a parallel job can make you feel like you've become the sorcerer's apprentice. You start a job, which launches many workers, and then you realise something is wrong. Once you've broken into the main Matlab job (as usual, by pressing CTRL-C) then you may close all of the workers by typing
{{{
aa_closeallworkers
}}}
Line 27: Line 32:
The slaves are launched without a matlab desktop, in a iconified xterm (i.e., it just appears in the task bar at the bottom). If you maximize one of these, you will see the writing is yellow. The slave id in the title bar (e.g., aaslave_0123456). Any error the slave has encountered will appear. == How it works ==
=== Master and workers ===
The Matlab/SPM/aa job running your user script acts as master coordinator, while a number of worker jobs actually do the processing.
Line 29: Line 36:
Slaves that have executed without a problem will disappear if they are dormant for 3 minutes or more. Ones that have encountered an error will not disappear. In aa version 2.0, the master will hang if a slave crashes. Viva la revolucion. The workers are launched without a matlab desktop, in a iconified xterm (i.e., it just appears in the task bar at the bottom). If you maximize one of these, you will see the writing is yellow. The worker id and machine name is shown in the title bar (e.g., aaworker_0123456:l35). Any error the worker has encountered will be display here.
Line 31: Line 38:
== Scheduling ==
Where multiple modules can be run simultaneously, they are. Within a module, there is no parallel execution. Part of the aa module definition specifies whether a module is run once per study, once per subject, or once per session.
Workers that have executed without a problem will disappear if they are dormant for 3 minutes or more. Ones that have encountered an error will not disappear. In aa version 2.0, the master will hang if a worker crashes. Viva la revolucion.
Line 34: Line 40:
|| Domain || When run in parallel || Benefit ||
|| Session || Always || Any time ||
|| Subject || When multiple subjects are being processed || Only for multi-subject studies ||
|| Study || Never || - ||
=== Scheduling ===
Multiple modules are run simultaneously where possible. Within a module, there is no parallel execution. Part of the aa module definition specifies whether a module is run once per study, once per subject, or once per session. This affects parallel scheduling as shown in the table.
Line 39: Line 43:
== Dependencies == || ''Domain'' || ''When run in parallel'' || ''Benefit'' ||
|| Session || Always || Any time there are multiple sessions ||
|| Subject || When multiple subjects are being processed || Any time there are multiple subjects ||
|| Study || If multiple study-level stages are marked as executing simultanously || Not in standard recipes at present ||

=== Dependencies ===
Line 42: Line 51:
By default, stages wait for the previous stage. However, the aas_addtask command, which is usually called from the tasklist (but occasionally from user scripts) may now include a fourth "tobecompleted" parameter, which specifies that this stage is dependent not on the previous one, but some other stage. By default, stages wait for the previous stage. However, the aas_addtask command, which is usually called from the tasklist (and occasionally from user scripts) may now include a fourth "tobecompleted" parameter, which specifies that this stage is dependent not on the previous one, but some other stage.
Line 71: Line 80:
If a stage is executed once-per-study, it will wait for all subjects/sessions from the previous stage to completed. If it is executed once-per-subject, it will wait for all of the sessions from this subject to be complete. If it is executed once-per-session, it will execute as soon as the session is completed from the previous stage. If a stage is executed once-per-study, it will wait for all subjects/sessions from the stage it is dependent on to completed. If it is executed once-per-subject, each subject will be executed as soon as all of the sessions from this subject of the stage to be completed are available. If it is executed once-per-session, it will execute as soon as the session is completed from the stage it is dependent on.

== Good practice ==
=== Getting optimal performance ===
You will get the best performance if your worker jobs are distributed across machines, and if those machines have low load. If you already have many SPM jobs open, as we have a limited number of Matlab licenses, you will be restricted to this selection. This makes it more likely that your workers will be allocated to the same machine, and that this machine will not be the least loaded available. You will get better performance in general if you clear out your old jobs with
{{{
closeallmyspms
}}}
before starting SPM to run a parallel job.

== Executing your own scripts in parallel ==
You may wrap up your code as an aa module, which has a low overhead (perhaps 10 additional lines). aa will then happily schedule them to run in parallel.

When writing modules, it is now good practice if possible to make them execute at the session rather than subject level if possible,, as this allows greater parallism. For this reason, I have modified aamod_smoooth and aamod_normwrite to be once per session rather than once per module.


Contents


Parallel processing with aa

aa version 2.0 has the facility to run multiple parts of your analysis at the same time in parallel. It uses coarse grain parallelism: different modules (*) execute simultaneously, but there is no attempt to subdivide single modules. A single subject analysis should speed up by around a factor of 2, and multiple subject analyses by a factor of 5-10 [benchmarking in progress - speed is dependent on the number of jobs you are allocated, which is determined by the memory, processor and Matlab license load on the Linux system].

(*) strictly, different instances of modules, as a single module such as slice timing will be run on different sessions simultaneously

How to use it

Starting

Only one small change is required to your user script. Replace the (usually final) line

aa_doprocessing

with

aa_doprocessing_parallel

Stopping!

Running a parallel job can make you feel like you've become the sorcerer's apprentice. You start a job, which launches many workers, and then you realise something is wrong. Once you've broken into the main Matlab job (as usual, by pressing CTRL-C) then you may close all of the workers by typing

aa_closeallworkers

How it works

Master and workers

The Matlab/SPM/aa job running your user script acts as master coordinator, while a number of worker jobs actually do the processing.

The workers are launched without a matlab desktop, in a iconified xterm (i.e., it just appears in the task bar at the bottom). If you maximize one of these, you will see the writing is yellow. The worker id and machine name is shown in the title bar (e.g., aaworker_0123456:l35). Any error the worker has encountered will be display here.

Workers that have executed without a problem will disappear if they are dormant for 3 minutes or more. Ones that have encountered an error will not disappear. In aa version 2.0, the master will hang if a worker crashes. Viva la revolucion.

Scheduling

Multiple modules are run simultaneously where possible. Within a module, there is no parallel execution. Part of the aa module definition specifies whether a module is run once per study, once per subject, or once per session. This affects parallel scheduling as shown in the table.

Domain

When run in parallel

Benefit

Session

Always

Any time there are multiple sessions

Subject

When multiple subjects are being processed

Any time there are multiple subjects

Study

If multiple study-level stages are marked as executing simultanously

Not in standard recipes at present

Dependencies

Most processing stages wait for the previous stage to complete before executing. However, some stages can execute before this. For example, realignment and tsdiffana can both execute together as soon as the dicom-to-nifti conversion of the EPIs is complete.

By default, stages wait for the previous stage. However, the aas_addtask command, which is usually called from the tasklist (and occasionally from user scripts) may now include a fourth "tobecompleted" parameter, which specifies that this stage is dependent not on the previous one, but some other stage.

For example, the aap_tasklist_general_ver02 now looks like this:

% EPI file prefix definitions now moved here as depends so strongly on
% tasklist - second parameter for functions that have EPI input, blank if not

aap.tasklist.epiprefix=[];      % reset prefix list
aap.tasklist.stages=[];         % reset list of stages
aap.tasklist.tobecompletedfirst=[];             % reset list of dependencies

aap=aas_addtask(aap,'aamod_study_init');
aap=aas_addtask(aap,'aamod_newsubj_init');
aap=aas_addtask(aap,'aamod_converttmaps');
aap=aas_addtask(aap,'aamod_copystructural',[],'aamod_newsubj_init');
aap=aas_addtask(aap,'aamod_convert_epis',[],'aamod_newsubj_init');
aap=aas_addtask(aap,'aamod_tsdiffana');
aap=aas_addtask(aap,'aamod_realign',[],'aamod_convert_epis');
aap=aas_addtask(aap,'aamod_slicetiming','r');
aap=aas_addtask(aap,'aamod_coreg_noss','r','aamod_realign');
aap=aas_addtask(aap,'aamod_norm_noss','ar');
aap=aas_addtask(aap,'aamod_norm_write','ar');
aap=aas_addtask(aap,'aamod_smooth','war');

Note the fourth parameters for aamod_copystructural, aamod_convert_epis (which may be run together, and at the same time as aamod_converttmaps), aa_realign (run at the same time as tsdiffana) and aamod_coreg_noss (run at the same time as slice timing).

The exact form of the dependency depends on the domain of each of the stages:

If a stage is executed once-per-study, it will wait for all subjects/sessions from the stage it is dependent on to completed. If it is executed once-per-subject, each subject will be executed as soon as all of the sessions from this subject of the stage to be completed are available. If it is executed once-per-session, it will execute as soon as the session is completed from the stage it is dependent on.

Good practice

Getting optimal performance

You will get the best performance if your worker jobs are distributed across machines, and if those machines have low load. If you already have many SPM jobs open, as we have a limited number of Matlab licenses, you will be restricted to this selection. This makes it more likely that your workers will be allocated to the same machine, and that this machine will not be the least loaded available. You will get better performance in general if you clear out your old jobs with

closeallmyspms

before starting SPM to run a parallel job.

Executing your own scripts in parallel

You may wrap up your code as an aa module, which has a low overhead (perhaps 10 additional lines). aa will then happily schedule them to run in parallel.

When writing modules, it is now good practice if possible to make them execute at the session rather than subject level if possible,, as this allows greater parallism. For this reason, I have modified aamod_smoooth and aamod_normwrite to be once per session rather than once per module.

CbuImaging: AutomaticAnalysisParallel (last edited 2013-03-07 21:23:46 by localhost)