Independent Component Analysis
The basic ICA problem (see Hyvarinen and Oja, Neural Networks 13 (2000) 411-430 for details)
Imagine you are in a room where two people are talking simultaneously. You record their conversations with two microphones located at different positions in the room. The time signals recorded at each microphone can be denoted by x1(t) and x2(t) with each representing the weighted sum of the signals from each person's conversation denoted by s1(t) and s2(t). This model can be expressed by the linear equations:
x1(t) = a11 x s1(t) + a12 x s2(t) (1)
x2(t) = a21 x s1(t) + a22 x s2(t)
which can be expressed in matrix notation as:
x = As (2)
Independent component analysis (ICA) attempts to estimate
the mixing matrix A and the independent component input signals s.
The starting point for solving this problem is the assumption that the
conversation signals s1 and s2 are statistically independent. It turns out
that they must also have non-Gaussian distributions and for simplicity it is
usually assumed that the matrix A is square.
Two ambiguities hold for ICA:
1) The variances of the independent
components cannot be determined. Since both A and s are
unknown any scalar multiplier in one of the sources
could always be cancelled by dividing the corresponding column of A by
the same scalar. Thus, it is assumed that each component has a variance of
1. However, this still leaves an ambiguity of sign.
2) The order of the independent
components cannot be determined. They are often sorted by the amount of
variance of the original signal that they explain.
Solving the problem
Equation 2 can be written in the form
s = A-1x (3)
To estimate one of the components (let's call it y) we could write
y = wTx
where w is a vector to be determined.
The Central Limit Theorem is used to estimate w so that it approximates a row of the inverse of A. For this example it is assumed that all the independent components si are identically distributed. A change of variables is made by setting z = ATw. Then we have y = wTx = wTAs = zTs. Thus y is a linear combination of si with weights given by zi. Since the Central Limit Theorem tells us that the sum of even two independent random variables is more Gaussian than the original variable, zTs is more Gaussian than any of the si and becomes least Gaussian when it is equal to one of the si. In this case only one of the elements of z would be non-zero. Thus maximizing the non-Gaussianity of wTx gives us one of the independent components.
The first step in solving for the independent components
is to prewhiten the observed data. This step simplifies the solution by
effectively halving the number of elements in matrix A that need to be
computed. The following graphs illustrate how this works.
Two random signal components s
Mixed signals x=As
Prewhitened mixed signals

The prewhitened mixed signals approximate a simple rotation of the original
component signals which requires a matrix of two components rather than four.
A couple of measures of non-Gaussianity are used to solve the ICA problem including Kurtosis and Negentropy. Also used are minimizing mutual information and maximum likelihood estimation (Infomax) which are not directly measures of non-Gaussianity but lead to the same end. Several of these methods introduce a nonlinear function into the solution process in order to better approximate entropy. On page 282 of their book on ICA, Hyvarinen et al. (2001) compare the statistical performance of several methods for 10 independent components that were supergaussian and found that the methods that used a tanh nonlinearity showed the best results.
Figure 5 from McKeown et al. (1998) shown below illustrates some of the properties of independent components as compared to correlation and principle components. The scatter plot in (a) shows hypothesized fMRI signal values at times t=1 and t=2 for each brain voxel. Vectors IC1 and IC2 show the directions determined by the relative activations of the two component processes. The data will vary independently along these two component vectors. Vectors PC1 and PC2 show the two perpendicular principle component directions indicating maximum variances in the data. Areas of active voxels for each method are designated by parallelograms: solid for ICA, dashed for correlation and dotted for PCA. Note that the active voxels are not necessarily the same for the three methods.

Figure 5b shows that IC1 and IC2 can be indirectly determined by finding a linear transform matrix W which results in a rectangular distribution. The sigmoid transform g(WX) makes the distribution more uniform and the ICA algorithm of Bell and Sejnowski (1995) further adjusts IC1' and IC2' to maximize the entropy of the distribution.
Below is a simple example of mixing and unmixing three simple signals and
noise using two different algorithms to solve for the components
Original signal
Mixed signal

Makeig's runica (Infomax)
Hyvarinen and Oja FastICA ( Negentropy)

It is quite apparent that for this particular example the negentropy algorithm seems to work better. Notice also that the order and sign of the unmixed components don't necessarily match those of the original signals.
Application of ICA to EEG data using functions from Scott Makeig's Matlab
toolbox. Makeig recommends that the number of sample points be several
times the size of the W matrix (number of components squared). Since the
data below are a one minute recording at 250 Hz (15000 time points) a PCA
was used to reduce the number of components from 128 (one per channel) to 64
thus keeping the number of data points several the size of W (64x64=4096).
Eye components: 1 - 4

Bottom three channels show original eye signals from right, left, up, down, and
blink eye movements
Heart component 38

Comparison of EEG signals before and after removing the four eye components for
EEG channels 1-4.

Sampling rate is 250 Hz so these plots show 20 seconds of data.
FMRI example:
GADTEST3 was run to verify that acitvations were seen in the visual cortex. It is a simple block design experiment with 30 seconds of visual stimuli and 30 seconds off. The figures below show results obtained using BrainVoyager for a simple correlation and for an ICA.
Correlation activations for GADMRI (/study/gadmri/GADTEST3/gadtestVis_1)

ICA1 activations for same

ICA1 time course (r=0.72, rms rank is 3):
Reference time course:

Below is a plot of the movement corrections from the BrainVoyager log:

and ICA component 2:

ICA component 5:

ICA component 9:

ICA component 19:

and the combined component map for 2, 5, 9, and 19:

Since there wasn't a convincing single movement component in the GADTEST3 data I looked through some of the previously processed DEPALL data motion correction logs and found a fair example. First is a plot of the movement corrections from the log:

For comparison is component 15 from the ICA run on the data before motion
correction:

And the component map for 15:

The significant component weightings tend to be clustered near the edges of the
brain.
References:
Bell, A.J., Sejnowski, T.J., (1995): An information-maximization approach to blind separation and blind deconvolution. Neural Comp. 7:1129-1159.
Hyvarinen, A., Oja, E., (2000): Independent component analysis: algorithms and applications. Neural Networks 13:411-430.
Hyvarinen, A., Karhunen, J., Oja, E., (2001): Independent component analysis. John Wiley & Sons, Toronto, 481 pp.
McKeown, M.J., Makeig, S., Brown, G.G., Jung, T., Kindermann, S.S., Bell, A.J., Sejnowski, T.J., (1998): Analysis of fMRI data by blind separation into independent spatial components, Human Brain Mapping, 6:160-188.