SpmBenchmarks

We have tested three high specification machines by recording the time taken for a some standard SPM99 processing steps. We tested the same Pentium 4 machine running Linux and Windows 2000

Machines tested

Name	Manufacturer / Model	OS	CPU no x speed (MHz)	Memory (GB)
V480	Sun V480	Solaris 8	4 x 900	6
ES45	HP ES45	True64	4 x 1000	8
Linux P4	Advantec Pentium 4	Mandrake Linux 9.0	1 x 2530	1
Windows P4	Advantec Pentium 4	Windows 2000	1 x 2530	1

For reference, here are the integer and floating point benchmark results for each machine, from www.spec.org:

||<tablewidth="92%" tablestyle="page-break-inside: avoid"20%>

CFP2000 base	CFP2000 rate	CINT2000 base	CINT2000 rate
V480	637	7.16	469	5.39
ES45	776	9.00	621	7.20
!Linux P4	992	11.0	944	11.5

Results for the Linux P4 are taken from the most similar machine that had been tested (Dell Precision WorkStation 350 2.53 GHz P4). Rates are for 1 processor, as Matlab is a single-threaded application. The V480 has not been tested with one processor; the results are estimated from (V480 rate for 2 processors) * (rate for one processor on ES45 / rate for two processors on ES45).

Tests

The tests were designed to assess speed for a typical SPM analysis on a single subject. The data consisted of four sessions of fMRI, with 235 images per session. Matrix size was 128x128x21. Analyses used matlab tic and toc timing functions around SPM99 batch mode scripts. The tests were

Realignment and reslicing
Calculation of realignment parameters and reslicing of all images 1..N, trilinear interpolation
Smoothing
Smoothing of original images with 8mm FWHM
Model estimation
Estimation of standard 4 session statistical model, applying low- (hrf) and high- (120 second) pass filters. The Linux P4 proved suprisingly slow on the model calculation, which was due to unusual slowness of multiplication of sparse by full matrices – see the SPM Intel tuning page. We found that model estimation was considerably faster in general if we avoided the sparse matrix multiplication, We therefore ran the following model estimation test:
Model estimation: optimized
Here we removed the use of sparse matrices from the model estimation.

We used Matlab 5.3 on the V480 and ES45, and Matlab 6.5 for the Linux P4. Our Matlab licensing meant that we could not use the same version on all three machines. We did compare the speed of the realignment process using Matlab 5.3 and Matlab 6.0 on the V480 and the ES45; differences were ~1%. Note that Matlab and SPM need to be optimized for the Pentium4 machine because of a problem with the default P4 handling of not-a-number values in floating point calculations. This is described in the SPM Intel tuning page.

Results

We tested the machines in two situations; with the data stored on the local hard disk, and with the data stored on a disk mounted using NFS. The values reported are times in minutes.

Data on local disk

Machine	Realign	Smooth	Model: standard	Model:optimized
V480	55.0	16.4	20.9	15.1
ES45	32.7	Not tested	18.7	Not tested
Linux P4	16.2	5.4	24.2	5.7
Windows P4	23.0	5.8	23.4	5.5

Data on local disk vs data via NFS

Machine	Realign: local	Realign: NFS	NFS / Local
V480	55.0	60	1.09
ES45	32.7	33.7	1.03
Linux P4	16.2	18.3	1.13

The V480 and ES45 connected to a Sun/Solaris NFS SCSI server via a switch. The Linux P4 connected via a hub to an NFS IDE server running Redhat linux 7.3.

We also timed the V480 and ES45 when running 6 simultaneous realignment jobs, comparing NFS and local storage. The slowdown attributable to NFS varied between 3 and 20%; the variation may have been due to unrelated NFS and CPU loads on the NFS server, which were sometimes heavy.

In addition to the tests listed in the table, we ran the following tests: mutual information coregistration (Linux P4: 2.0 minutes); normizalization of structural image only (Linux P4: 46 seconds); normalization of structural image and reslicing of 960 fMRI images (LinuxP4: 12.5 minutes, Windows P4: 20.0 minutes).

The tests imply that most of a standard single-subject analysis (realignment, coregistration, normalization, smoothing, statistical analysis, writing contrasts) would take 16.2 + 2.0 + 12.5 + 5.4 + 5.7 + 3.2 = 45 minutes on the Linux P4.

Conclusions

As expected from published integer and floating point benchmarks, the Intel solution was the best performer on these real-world tests of SPM performance. Keeping data on the local hard disk results in a speed gain of the order of 10%.

Linux or Windows?

The Pentium machine is fast running SPM under Linux or Windows. Realignment/reslicing is 42% slower on Windows, normalization/reslicing is 60% slower. Both procedures involve a large amount of image writing and resampling; Windows may be slower because of slower disk access and/or less effective caching. Assuming coregistration takes the same time on Linux and Windows, the whole processing stream for Windows would take around 59 minutes, which 32% slower than Linux. Of course the choice between Linux and Windows is likely to be dictated by other factors, among which are NFS speed, multitasking performance, and the other applications you want to run.Matthew Brett

Rhodri Cusack

7^th April 2003