Random Effects Analysis

A summary for the Keck Imaging Lab

This web-page contains a summary of Random Effects Analysis within SPM. Many of the passages here are lifted (without proper credit!) from the SPM email burster archives. If you want to trace the source, read the Keck Lab's SPM How-tos page on Random Effects, and you should recognize most of the verbage here as having come from there. I have tried to condense the discussion in the emails to a pithy and readable explanation of the basics of Random Effects Analysis. Once you finish reading here, you'll probably want to delve into some of the other more authoritative sources. Darren Gitelman's page is the next logical place to go. The SPM page on Random Effects for SPM96 has some good pointers as well.

Wil Irwin has written a nice description of Fixed and Random Effects from a more classical ANOVA point of view, with an eye toward applying this background knowledge to SPM.

Two reasons (a big one and a little one) to perform a Random Effects Analysis are:

• So you can generalize your results to the population from which your subjects were drawn.
• To minimize operator errors with SPM's menus, which is compounded for large designs.

The following questions are addressed below:

What is "Random Effects"?

A "Random-Effects" analysis is also referred to as a "Mixed Effects" analysis, since it considers both within- and between-subject variance. In SPM this realized through a "second level" analysis. First, we need to understand what a "first level" analysis is. A "first level" or "Fixed Effects" analysis is the standard way to set up an analysis design using your original data as the input. A first level design uses within-subject variance, thus providing for inferences that generalize to the subjects studied. This is good if you only want to report your results qualitatively, or as a case study of the specific subjects you used. Frequently, however, we would like to make broader inferences or conclusions about the general population from which the subjects were drawn.

Let's assume we have a fMRI study on depression. There are 10 controls and 10 depressed subjects, and we want to see if there are any differences in the brain activation patterns due to various stimulii (happy, sad, etc. compared to a baseline neutral stimulus). The standard or first level approach would be to feed all of the images for all of the subjects into a large design matrix, and set up a series of contrasts for these data. Since this analysis does not model between-subject variance, we cannot know if we are adequately addressing the chance that our chosen subjects are in fact not very representative of the population from which they were drawn. If there are several apparent outliers, then there are two possibilities: (i) these subjects are in fact outliers, and our sample group is not very representative of its base population; or (ii) the base population is ill-defined and/or rather heterogenous. In either case, the certainty of our conclusions should reflect this.

Since we have not adequately addressed between-subject variance in the Fixed Effects analysis, we are restricted to making inferences about the specific group of control and depressed subjects we scanned. This is usually only moderately exciting- what really gets a referee's attention is if you can generalize your results to a larger population. For instance, it would be nice to be able to say, "All depressed people tend to activate more in the _____ than all non-depressed people." (Fill in the blank with your favorite brain structure.) A Fixed Effects analysis does not let us make this broader (and more publishable) statement.

A Random Effects analysis incorporates both within-subject variance (derived from the first level analysis), as well as between-subject/session differences (derived from the second level analysis) whose estimator is the correct mix of within and between-subject error. This allows you to generalize to the population from which the subjects came. A Random Effects analysis is performed within SPM by the "second level" of analysis, which effects a random effects model where the error variance is solely the inter-subject (i.e. intra-population) variance.

The purpose of the Random Effects analysis is to find the areas that are activated in much the same way in all subjects, as opposed to the fixed effects model which gives you the areas that are activated "on the average" across the subjects. This is really a crucial difference since a fixed effects analysis may yield "significant" results when one or a couple of subjects activate "a lot" even though the other subjects do not activate at all.

When do I need to use a Random Effects approach?

Strictly speaking, you should use a Random Effects approach whenever you want to be able to generalize your inferences to the general populations from which your subjects were drawn.

In PET (but not fMRI) the similarity of between- and within-subject variances and between the number of scans per subject and the number of subjects means that the difference between first and second-level analyses are much less severe. Traditionally PET studies are analyzed at the first level. Second-level analyses are usually employed when you want to make an inference about group differences given some within-subject replications.

If you only have one scan per condition per subject (e.g FDG-PET), a second-level analysis is not appropriate. The point of a second-level analysis is to incorporate the appropriate mix of within-subject and between-subject variance estimates. With one scan/condition/subject, there is not usually a valid "within-subject" variance estimate, so you should stick to the first level of analysis.

How do I do a Random Effects analysis in SPM?

Ahh, the crux of the matter. The idea is to feed a series of contrast images (con*.img) resulting from the first-level analysis of a number of individual subjects into a second level of analysis within SPM. For the second level, the con*.img's are treated exactly as if they were original image data, or possibly some sort of average image data for a particular condition.

The contrast images represent spatially distributed images of the weighted sum of the parameter estimates for a particular contrast. In essence (and for our example above), it's like a difference image for (activation-rest) or (happy-neutral). You need one contrast image for each patient and each control. By doing that you are collapsing over intra-subject variability (to only one image per contrast per subject) and the image-to-image residual variability is now between subject variance alone.

In the first level of analysis in a Random Effects approach, you should use "proportional scaling" because you want the contrast images entering into the second level of analysis to be on the same scale. That is to say, you want each of the con*.img's from the various subjects to be directly comparable, so they should be scaled to some common level. At the second level, you should not do any further scaling, threshold, masking, etc. (Although additional smoothing may be appropriate.)

For the example above, in the First-level analysis use a "multi-group conditions and covariates" design, and in the Second level of analysis use compare the two sets of subject-specific contrasts with a two-sample t-test under "basic models". You generate a contrast parameter estimate map for each subject at the first level in the Results section by looking at contrasts between e.g. happy and neutral. The contrasts will be set up like:
control subj1: [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
control subj2: [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
...
control subj10: [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
...
depres subj19: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
depres subj20: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]

The subsequent t (and F) tests performed in the second level of analysis are against the null hypothesis of zero mean difference, using the one (or two)-sample t-test option in SPM99. You should generally use the "Basic Stats" options for a second-level analysis, rather than one of the options from the "PET" or "fMRI" stats menus. A two-sample (independent-sample) t-test compares the average of your parameter estimates for each group.

To do a between-groups analysis, use the two-sample t-test option under basic models and then provide the 10 subject-specific con*.img files for each of your two groups. This will test for differences in that effect between the groups (in either direction depending upon the contrast you specify). Apply the opposite contrast to check the negative differences. In the Results section, you can enter contrasts like:

[1 -1] (control > depressed)
[-1 1] (control < depressed)

Using a one-tailed t-test, you can test for the direction of the difference if you have prior predictions that some areas will show increases while other areas may show decreases. A contrast of (1) means 'test for a significant mean (+ve) effect' and a contrast of (-1) means 'test for a significant mean (-ve) effect'. Doing an F-test at the second level would test for ANY mean effects significantly different to 0 - without any constraints on the direction of these effects.

The con*.img files are independent of one another, even though they were produced from within the same design matrix, because SPM is designed so that the individual subjects are seperable. In practice, we usually generate a con*.img for each subject for each condition in a seperate analysis (rather than lumping all of the scans into a single design matrix and selecting the particular scan we want), so that we can keep the resulting con*.img's organized in a logical directory structure. Also, you will find that as your design matrix gets larger (i.e. as you get more subjects), mistakes made while navigating SPM's menus become very costly as you usually must start over. By doing one subject at a time, the overall "mistake rate" goes down, and if a mistake in file-selection is made you will only have to start over on one small analysis. However, if you want to do a Fixed-Effects analysis only, or in addition to a Random Effects analysis, then you will save time by creating one large design matrix at the outset.

What is a "con*.img"?

A contrast image is refered to as a "con*.img" because within SPM, the various contrast images are named according to their specific numbered order within the SPM contrast manager for a particular analysis. For instance, if 3 contrasts were defined, you would have the files named con001.img, con002.img, and con003.img. They are also refered to as "con???.img" or "conXXX.img", indicating a generic contrast image, and not any particular one.

A contrast image summarizes the activation effect for a particular subject. The contrast images represent spatially distributed images of the weighted sum of the parameter estimates for a particular contrast. Put another way, a con*.img is a parametric image of the parameter estimates for a particular contrast, weighted for the variance at each pixel. In essence (and for our example above), it's a kind of difference image for (activation-rest) or (happy-neutral). You need one contrast image for each patient and each control.

SPM99 writes out contrast images in addition to SPM{t/F} images by default in the results section. Further, these images are floating point images, with out-of-brain voxels set to NaN (not a number). Thus, these images are implicitly masked. Since SPM statistics ignores voxels where any image volume is NaN, these contrast images can be put back into the statistics section, without having to worry about using non-brain pixels in your analysis.

Do not confuse the con*.img with the activation image (SPM{t}.img) generated for a given analysis! The SPM{t}.img is composed of a con*.img divided by some estimate of the standard error. The statistic (SPM) images (SPMt_????.img & SPMF_????.img) should not be entered into a second level analysis if you want to effect a random effects analysis. This would basically be assessing the significance (across subjects) of the individual subjects' significance! (Rather than the significance (across subjects) of the response.)

What are all those other image files for?

SPM99 produces a series of intermediate images leading up to the final output. These images include:

1. Voxel-by-voxel maps of the parameter estimates (Beta images) of the fit to the model.
2. "Goodness-of-fit" estimates of each voxel time-course to the model.
3. Weighted parameter estimates (con***.img or ess***.img).

The various images created by SPM can be summarized as follows (thanks to Tom Johnstone!):

1) beta_***.img: these are created during the estimation stage - i.e. the fitting of the model. There is one beta image per column of the design matrix. These are the parameter estimates of course, with the first ones corresponding to the variables of interest, the last one corresponding to the constant in the model.

2) con_***.img: these are created when calculating t-contrasts and correspond to weighted sums of the beta weights. The numbering of the contrast images corresponds to the number of the contrast created in the contrast manager

3) SPM{t} is computed by dividing the contrast image (con_***.img) by its standard error (a multiple of the square root of ResMS.img). The SPM{t} is saved as spmT_***.img.

4) ess_***.img: these are images of the extra sum of squares for the corresponding F-contrasts, corresponding to the con_*** images from a t-test..

5) SPM{F} is essentially an extra-sum-of squares test (See Draper & Smith "Applied Regression Analysis"), where the additional variance explained by a section of the model is compared with the error variance. In other words, the F-test is concerned with how much a given variable *uniquely* adds to explained variance. SPM{F} is spmF_***.img, computed by dividing the ess_***.img by the ResMS.img error variance estimate image, and scaling appropriately.

The t-contrast basically tells us about whether a specific linear contrast of beta estimates differs from zero. The F-contrast tells us about how much a given linear contrast of parameter estimates (as a subset of such contrasts) contributes uniquely to explaining variance in the data.

How does the variance get incorporated at each level?

SPM computes an estimate of the variance at each pixel for all of the images that go into a particular analysis. This variance is then used as a weighting factor for the computed t-values in the SPM{t}.img, so that t-values with a high underlying variance are seen as less reliable and hence are diminshed in importance (and value!). A con*.img is a parametric image of the parameter estimates for a particular contrast.

At the second level, which compares con*.imgs from the first-level analysis of several subjects, the between-subject variance is estimated and incorporated into the SPM{t}.img(s) resulting from this analysis. In practice, a model is fit to the weighted parameter estimate images of the 1st level analysis (contrast images). The error variance of this 2nd level model is then over subjects, not over scans, because you have one image per subject.

Generally (or almost always) the error map for the fixed and random effects models are different.

How does a Random Effects analysis influence the power?

Well, you can't expect to make amazing far-reaching statements without giving up something. In the case of Random Effects analysis, you give up many degrees of freedom and hence power. If you have fewer than approximately 10-12 subjects per group, you probably lack sufficient power for a satisfying analysis, although you can still proceed.

The degrees of freedom for the estimate of the error variance is something like:
'number of subjects' - 'rank of 2nd level design matrix'
so the degrees of freedom is much lower than in the 1st level analysis. In a first level analysis, the degrees of freedom are related to the number of scans, which can be a satisfyingly large number for most fMRI protocols. However, the Random Effects analysis leaves you with a number of degrees of freedom which is a function of the number of subjects (not scans!) and the design matrix you select at the 2nd level analysis. In our example from above, d.f. = (10+10)-2=18, which is a couple of orders of magnitude less than the total number of scans that may have gone into the 1st level analysis.

In a Fixed Effects analysis, you analyze the data only at the 1st level. The estimated error variance at a voxel is a function of the model and the actual fit to the data, i.e. you look at the variance over all of the scans. Here one usually has high degrees of freedom for a group study, because the degrees of freedom (for a PET study) is 'number of scans' - 'rank of design matrix'. If you have several scans/subject, the degrees of freedom will reflect this, and your analysis will have correspondingly more power.

There is a critical distinction between a Fixed Effects and Random Effects analysis of data with respect to the degrees of freedom and the inference. In terms of inference, the difference between a FFX and RFX analysis is that with a RFX analysis you generalize your inferences to the population from which the subjects/patients were selected. With a FFX analysis, you make inferences only about your measured data. However, the more general inference facilitated by a RFX analysis has its price in the lower degrees of freedom available (given that you have more than 1 scan/subject).

The random effect (or mixed models) analysis is generally performed by calculating a F value as the ratio:
(main effect linked variance) / (interaction variance).
Then, the interaction df is:
(number_of_subject-1)*(number_of_replication_per_subject - 1).
The contrasts of interest are then calculated in the usual way in SPM but the interaction variance is taken as residual variance. In the special case of two conditions (two levels of your main effect) and one replication per subject (or data averaged over balanced replications), the "conventional" F-test described above and the F-contrast on an SPM one-sample t-test are equivalent. Because the con*imgs already contain the effect parameters for each subject, the residual error in the SPM model is identical to the (subject x effect) interaction, the denominator of the conventional repeated measures ANOVA.

When there are more than two levels of your factor and you want an omnibus F-test (rather than a specific planned comparison, i.e. t-contrast), you must use a PET design. However, the resulting analysis uses a pooled error term (even for factorial designs) and there is currently no correction for sphericity violations (so your p-values may be inflated). This is why it is advisable to keep second-level models to one/two-sample t-tests on specific t-contrast images.

You should expect differences not only in the size but in the pattern of activations, if you compare a Fixed Efects to a Random Effects analysis. This is because not only does the resulting SPM{t} have a different numerator (con*.img's, as opposed to "original" or sumamry data), but the SPM{t}'s denomenator (the standard error or variance map) is also different. You can expect the Random Effects analysis to yield a less biased but also less sensitive SPM{t}.

How does a Conjunction analysis compare with a Random Effects analysis?

The RFX model will tell you if all subjects "activate" in the same location and with roughly the same magnitude. If you want to answer the slightly less stringent question "do they all activate in the same location?" you can use a conjunction across subjects instead.

What considerations are there for data acquisition with respect to Random Effects?

Generally, a second-level analysis in SPM assumes that the design is balanced, or the design matrices are identical for each subject. This implies that the data should be the same for each subject, particularly the number of scans/subject. The reason for this is that the contribution to the first-level estimate of the variance should be comparable across subjects. However, the actual order of scans can usually be randomized or changed across subjects (unless you have reason to believe that a particular order invokes a reaction not otherwise seen) without invalidating the "sameness" assumption.

The experts recommend a minimum of 12-ish subjects for a Random Effects analysis, or for a multi-group design, 9-12 subjects per group. After approximately 20 subjects/group there is little marginal return for adding one more subject.

Are there differences in a Random Effects analysis between PET and fMRI?

For PET one must explicitly model the effects in a subject-separable fashion (i.e. 'conditions x subj') in order to proceed to a second level of analysis. This is enforced in the fMRI setup because each session is specified separately.

In PET (but not fMRI) the similarity of between- and within-subject variances and between the number of scans per subject and the number of subjects means that the difference between first and second-level analyses are much much less severe. Traditionally, PET studies are analyzed (only) at the first level, although the results are still assumed to be applicable to the population from which the subjects were drawn.

fMRI studies lend themselves very nicely to a Random Effects analysis, since the first step of analysis is typically to condense a large number of scans for two or more conditions down to a single con*.img for each condition pair (e.g. activation - neutral).

Back to Terry's Homepage.