SPM benchmarks

We have tested three high specification machines by recording the time taken for a some standard SPM99 processing steps. We tested the same Pentium 4 machine running Linux and Windows 2000

Machines tested

Name

Manufacturer / Model

OS

CPU no x speed (MHz)

Memory (GB)

V480

Sun V480

Solaris 8

4 x 900

6

ES45

HP ES45

True64

4 x 1000

8

Linux P4

Advantec Pentium 4

Mandrake Linux 9.0

1 x 2530

1

Windows P4

Advantec Pentium 4

Windows 2000

1 x 2530

1

For reference, here are the integer and floating point benchmark results for each machine, from www.spec.org:

||<tablewidth="92%" tablestyle="page-break-inside: avoid"20%>

CFP2000 base

CFP2000 rate

CINT2000 base

CINT2000 rate

V480

637

7.16

469

5.39

ES45

776

9.00

621

7.20

!Linux P4

992

11.0

944

11.5

Results for the Linux P4 are taken from the most similar machine that had been tested (Dell Precision WorkStation 350 2.53 GHz P4). Rates are for 1 processor, as Matlab is a single-threaded application. The V480 has not been tested with one processor; the results are estimated from (V480 rate for 2 processors) * (rate for one processor on ES45 / rate for two processors on ES45).

Tests

The tests were designed to assess speed for a typical SPM analysis on a single subject. The data consisted of four sessions of fMRI, with 235 images per session. Matrix size was 128x128x21. Analyses used matlab tic and toc timing functions around SPM99 batch mode scripts. The tests were

  1. Realignment and reslicing
    Calculation of realignment parameters and reslicing of all images 1..N, trilinear interpolation

  2. Smoothing
    Smoothing of original images with 8mm FWHM

  3. Model estimation
    Estimation of standard 4 session statistical model, applying low- (hrf) and high- (120 second) pass filters. The Linux P4 proved suprisingly slow on the model calculation, which was due to unusual slowness of multiplication of sparse by full matrices – see the SPM Intel tuning page. We found that model estimation was considerably faster in general if we avoided the sparse matrix multiplication, We therefore ran the following model estimation test:

  4. Model estimation: optimized
    Here we removed the use of sparse matrices from the model estimation.

We used Matlab 5.3 on the V480 and ES45, and Matlab 6.5 for the Linux P4. Our Matlab licensing meant that we could not use the same version on all three machines. We did compare the speed of the realignment process using Matlab 5.3 and Matlab 6.0 on the V480 and the ES45; differences were ~1%. Note that Matlab and SPM need to be optimized for the Pentium4 machine because of a problem with the default P4 handling of not-a-number values in floating point calculations. This is described in the SPM Intel tuning page.

Results

We tested the machines in two situations; with the data stored on the local hard disk, and with the data stored on a disk mounted using NFS. The values reported are times in minutes.

Data on local disk

Machine

Realign

Smooth

Model: standard

Model:optimized

V480

55.0

16.4

20.9

15.1

ES45

32.7

Not tested

18.7

Not tested

Linux P4

16.2

5.4

24.2

5.7

Windows P4

23.0

5.8

23.4

5.5

Data on local disk vs data via NFS

Machine

Realign: local

Realign: NFS

NFS / Local

V480

55.0

60

1.09

ES45

32.7

33.7

1.03

Linux P4

16.2

18.3

1.13

The V480 and ES45 connected to a Sun/Solaris NFS SCSI server via a switch. The Linux P4 connected via a hub to an NFS IDE server running Redhat linux 7.3.

We also timed the V480 and ES45 when running 6 simultaneous realignment jobs, comparing NFS and local storage. The slowdown attributable to NFS varied between 3 and 20%; the variation may have been due to unrelated NFS and CPU loads on the NFS server, which were sometimes heavy.

In addition to the tests listed in the table, we ran the following tests: mutual information coregistration (Linux P4: 2.0 minutes); normizalization of structural image only (Linux P4: 46 seconds); normalization of structural image and reslicing of 960 fMRI images (LinuxP4: 12.5 minutes, Windows P4: 20.0 minutes).

The tests imply that most of a standard single-subject analysis (realignment, coregistration, normalization, smoothing, statistical analysis, writing contrasts) would take 16.2 + 2.0 + 12.5 + 5.4 + 5.7 + 3.2 = 45 minutes on the Linux P4.

Conclusions

As expected from published integer and floating point benchmarks, the Intel solution was the best performer on these real-world tests of SPM performance. Keeping data on the local hard disk results in a speed gain of the order of 10%.

Linux or Windows?

The Pentium machine is fast running SPM under Linux or Windows. Realignment/reslicing is 42% slower on Windows, normalization/reslicing is 60% slower. Both procedures involve a large amount of image writing and resampling; Windows may be slower because of slower disk access and/or less effective caching. Assuming coregistration takes the same time on Linux and Windows, the whole processing stream for Windows would take around 59 minutes, which 32% slower than Linux. Of course the choice between Linux and Windows is likely to be dictated by other factors, among which are NFS speed, multitasking performance, and the other applications you want to run.Matthew Brett

Rhodri Cusack

7th April 2003