6429
Comment:

← Revision 4 as of 20130307 21:24:08 ⇥
6425
converted to 1.6 markup

Deletions are marked like this.  Additions are marked like this. 
Line 10:  Line 10: 
For reference, here are the integer and floating point benchmark results for each machine, from [http://www.spec.org/ www.spec.org]:  For reference, here are the integer and floating point benchmark results for each machine, from [[http://www.spec.org/www.spec.org]]: 
Line 17:  Line 17: 
Results for the Linux P4 are taken from the most similar machine that had been tested (Dell Precision WorkStation 350 2.53 GHz P4). Rates are for 1 processor, as Matlab is a singlethreaded application. The V480 has not been tested with one processor; the results are estimated from (V480 rate for 2 processors) * (rate for one processor on ES45 / rate for two processors on ES45).  Results for the Linux P4 are taken from the most similar machine that had been tested (Dell Precision !WorkStation 350 2.53 GHz P4). Rates are for 1 processor, as Matlab is a singlethreaded application. The V480 has not been tested with one processor; the results are estimated from (V480 rate for 2 processors) * (rate for one processor on ES45 / rate for two processors on ES45). 
Line 22:  Line 22: 
1. Realignment and reslicing Calculation of realignment parameters and reslicing of all images 1..N, trilinear interpolation 
1. Realignment and reslicing <<BR>> Calculation of realignment parameters and reslicing of all images 1..N, trilinear interpolation 1. Smoothing <<BR>> Smoothing of original images with 8mm FWHM 1. Model estimation <<BR>> Estimation of standard 4 session statistical model, applying low (hrf) and high (120 second) pass filters. The Linux P4 proved suprisingly slow on the model calculation, which was due to unusual slowness of multiplication of sparse by full matrices – see the [[SpmWithPentium4SPM Intel tuning]] page. We found that model estimation was considerably faster in general if we avoided the sparse matrix multiplication, We therefore ran the following model estimation test: 1. Model estimation: optimized <<BR>> Here we removed the use of sparse matrices from the model estimation. 
Line 27:  Line 28: 
1. Smoothing Smoothing of original images with 8mm FWHM 1. Model estimation Esimation of standard 4 session statistical model, applying low (hrf) and high (120 second) pass filters. The Linux P4 proved suprisingly slow on the model calculation, which was due to unusual slowness of multiplication of sparse by full matrices – see the [wiki:SpmWithPentium4 SPM Intel tuning] page. We found that model estimation was considerably faster in general if we avoided the sparse matrix multiplication, We therefore ran the following model estimation test: 1. Model estimation: optimized Here we removed the use of sparse matrices from the model estimation. We used Matlab 5.3 on the V480 and ES45, and Matlab 6.5 for the LinuxP4. Our Matlab licensing meant that we could not use the same version on all three machines. We did compare the speed of the realignment process using Matlab 5.3 and Matlab 6.0 on the V480 and the ES45; differences were ~1%. Note that Matlab and SPM need to be optimized for the Pentium4 machine because of a problem with the default P4 handling of notanumber values in floating point calculations. This is described in the [wiki:SpmWithPentium4 SPM Intel tuning] page. 
We used Matlab 5.3 on the V480 and ES45, and Matlab 6.5 for the Linux P4. Our Matlab licensing meant that we could not use the same version on all three machines. We did compare the speed of the realignment process using Matlab 5.3 and Matlab 6.0 on the V480 and the ES45; differences were ~1%. Note that Matlab and SPM need to be optimized for the Pentium4 machine because of a problem with the default P4 handling of notanumber values in floating point calculations. This is described in the [[SpmWithPentium4SPM Intel tuning]] page. 
SPM benchmarks
We have tested three high specification machines by recording the time taken for a some standard SPM99 processing steps. We tested the same Pentium 4 machine running Linux and Windows 2000
Machines tested
Name 
Manufacturer / Model 
OS 
CPU no x speed (MHz) 
Memory (GB) 
V480 
Sun V480 
Solaris 8 
4 x 900 
6 
ES45 
HP ES45 
True64 
4 x 1000 
8 
Linux P4 
Advantec Pentium 4 
Mandrake Linux 9.0 
1 x 2530 
1 
Windows P4 
Advantec Pentium 4 
Windows 2000 
1 x 2530 
1 
For reference, here are the integer and floating point benchmark results for each machine, from www.spec.org:
<tablewidth="92%" tablestyle="pagebreakinside: avoid"20%>
CFP2000 base 
CFP2000 rate 
CINT2000 base 
CINT2000 rate 

V480 
637 
7.16 
469 
5.39 
ES45 
776 
9.00 
621 
7.20 
!Linux P4 
992 
11.0 
944 
11.5 
Results for the Linux P4 are taken from the most similar machine that had been tested (Dell Precision WorkStation 350 2.53 GHz P4). Rates are for 1 processor, as Matlab is a singlethreaded application. The V480 has not been tested with one processor; the results are estimated from (V480 rate for 2 processors) * (rate for one processor on ES45 / rate for two processors on ES45).
Tests
The tests were designed to assess speed for a typical SPM analysis on a single subject. The data consisted of four sessions of fMRI, with 235 images per session. Matrix size was 128x128x21. Analyses used matlab tic and toc timing functions around SPM99 batch mode scripts. The tests were
Realignment and reslicing
Calculation of realignment parameters and reslicing of all images 1..N, trilinear interpolationSmoothing
Smoothing of original images with 8mm FWHMModel estimation
Estimation of standard 4 session statistical model, applying low (hrf) and high (120 second) pass filters. The Linux P4 proved suprisingly slow on the model calculation, which was due to unusual slowness of multiplication of sparse by full matrices – see the SPM Intel tuning page. We found that model estimation was considerably faster in general if we avoided the sparse matrix multiplication, We therefore ran the following model estimation test:Model estimation: optimized
Here we removed the use of sparse matrices from the model estimation.
We used Matlab 5.3 on the V480 and ES45, and Matlab 6.5 for the Linux P4. Our Matlab licensing meant that we could not use the same version on all three machines. We did compare the speed of the realignment process using Matlab 5.3 and Matlab 6.0 on the V480 and the ES45; differences were ~1%. Note that Matlab and SPM need to be optimized for the Pentium4 machine because of a problem with the default P4 handling of notanumber values in floating point calculations. This is described in the SPM Intel tuning page.
Results
We tested the machines in two situations; with the data stored on the local hard disk, and with the data stored on a disk mounted using NFS. The values reported are times in minutes.
Data on local disk
Machine 
Realign 
Smooth 
Model: standard 
Model:optimized 
V480 
55.0 
16.4 
20.9 
15.1 
ES45 
32.7 
Not tested 
18.7 
Not tested 
Linux P4 
16.2 
5.4 
24.2 
5.7 
Windows P4 
23.0 
5.8 
23.4 
5.5 
Data on local disk vs data via NFS
Machine 
Realign: local 
Realign: NFS 
NFS / Local 
V480 
55.0 
60 
1.09 
ES45 
32.7 
33.7 
1.03 
Linux P4 
16.2 
18.3 
1.13 
The V480 and ES45 connected to a Sun/Solaris NFS SCSI server via a switch. The Linux P4 connected via a hub to an NFS IDE server running Redhat linux 7.3.
We also timed the V480 and ES45 when running 6 simultaneous realignment jobs, comparing NFS and local storage. The slowdown attributable to NFS varied between 3 and 20%; the variation may have been due to unrelated NFS and CPU loads on the NFS server, which were sometimes heavy.
In addition to the tests listed in the table, we ran the following tests: mutual information coregistration (Linux P4: 2.0 minutes); normizalization of structural image only (Linux P4: 46 seconds); normalization of structural image and reslicing of 960 fMRI images (LinuxP4: 12.5 minutes, Windows P4: 20.0 minutes).
The tests imply that most of a standard singlesubject analysis (realignment, coregistration, normalization, smoothing, statistical analysis, writing contrasts) would take 16.2 + 2.0 + 12.5 + 5.4 + 5.7 + 3.2 = 45 minutes on the Linux P4.
Conclusions
As expected from published integer and floating point benchmarks, the Intel solution was the best performer on these realworld tests of SPM performance. Keeping data on the local hard disk results in a speed gain of the order of 10%.
Linux or Windows?
The Pentium machine is fast running SPM under Linux or Windows. Realignment/reslicing is 42% slower on Windows, normalization/reslicing is 60% slower. Both procedures involve a large amount of image writing and resampling; Windows may be slower because of slower disk access and/or less effective caching. Assuming coregistration takes the same time on Linux and Windows, the whole processing stream for Windows would take around 59 minutes, which 32% slower than Linux. Of course the choice between Linux and Windows is likely to be dictated by other factors, among which are NFS speed, multitasking performance, and the other applications you want to run.Matthew Brett
Rhodri Cusack
7^{th} April 2003