Caret:Documentation:Statistics
From Van Essen Lab
Contents

WARNING
THIS DOCUMENT IS IN DEVELOPMENT AND DESCRIBES FUTURE VERSIONS OF CARET
Descriptive Statistics
Descriptive statistics provide information about the data such as the mean (average), median (middle value), mode (most common value), standard deviation, and variance. When computing the standard deviation, one must know if the data values represent the entire population in which case division is by N (number of items) or the data values are a subsample of the population in which case division is by N  1.
Population Descriptive Statistics
 Population Mean
 Population Standard Deviation OR
 Population Variance = σ^{2}
 Standard Deviation of the Mean
Sample Descriptive Statistics
 Sample Mean
 Sample Standard Deviation OR
 Sample Variance = S^{2}
 Standard Error of the Mean
Miscellaneous Descriptive Statistics
 ZScore
Inferential Statistic Tests
Parametric Inferential Tests
For parametric tests, the data is assumed to be in a specific probability distribution, typically the normal (gaussian) distribution.
ANOVA (Analysis of Variance), One Way
A oneway ANOVA determines if the mean values at each node for two or more groups of subjects are statistically different. The groups being compared are allowed to have a different number of subjects.
K = Number of Groups
N = Total Number of Subjects
N_{i} = Number of Subjects in Group "i"
df_{Total} = N − 1
df_{Treatment} = K − 1
X_{ij} = Measurement for subject "j" in group "i"
Mean of group i,
Grand Mean,
SS_{Total} = SS_{Within} + SS_{Treatment}
If the ANOVA is run with two groups of data, the Fstatistic is equivalent to the square of the TStatistic produced by a TwoSample TTest.
TTest, OneSample (Single Sample)
A onesample TTest determines if the mean value at each node is statistically different than a specified value, often zero.
t =
df = N − 1
TTest, Paired (Dependent Means)
A paired TTest determines if mean at each node is statistically different for two measurements (X and Y) on one group of subjects.
t =
df = N − 1
TTest, TwoSample (Independent Means)
A twosample TTest determines if the means at each node for two groups of subjects are statistically different. The groups being compared are allowed to have a different number of subjects.
Equal (Pooled) Variances
df = N_{1} + N_{2} − 2
Unequal (Unpooled) Variances
NonParametric (Distribution Free) Inferential Statistic Tests
For nonparametric tests, no assumptions are made about the distribution of the data.
caret_stats
caret_stats is a command line program that performs statistical operations on GIFTI surface data files. The first parameter indicates the operation that will be performed. Run the command with just the operation for help information.
The program is written in Java and requires the Java SE Development Kit (JDK) for optimal execution. If you are using a Mac, Java is already installed and you can skip this step. If you are running Linux or Windows, you must download the Java JDK. The Java Development Kit is downloaded from http://java.sun.com/javase/downloads/index.jsp. Download and install the Java SE Development Kit (JDK). You must set the "path" environment variable to the Java installation's "bin" directory so that "java" can be run from the command line.
Note: Do not use the Java Runtime Environment. It does not support Java's "server" option which reduces the runtime of caret_stats by fifty percent. If you get the error message "No Server JVM" you are using JRE, not JDK.
After Java is installed, download the caret6 distribution. Install in the desired location such as "Program Files" on Windows, "/Applications" on a Mac, or "/usr/local" on Linux. When the distribution is unzipped, it will create the subdirectory "caret6". Located in the caret6 directory are several directories whose names being with "bin". You must update your PATH environment variable to point to the appropriate "bin" directory so that "caret_stats" can be run from the command line. In addition, Windows users will need to set the environment variable CARET6_HOME to the full path of the caret6 directory (eg: C:\caret6).
If you have a problem see your System Administrator, and, most importantly, remember that John Harwell is NOT your System Administrator.
Descriptive Statistical Operations
 descriptive Mean, standard deviation, etc.
Inferential Statistical Operations
The purpose of the inferential statistic is to take the input files, perform a statistical test at each node, and create a new file containing one or more statistical measurements (F, T, Z, etc) at each node.
Performing Inferential Statistical Tests in Caret
Inferential statistical tests in Caret are performed on metric or surface shape files. All of the data (metric or shape files) must be on a coregistered surface so that all data files have the same number of nodes and each node number i is "in register" across subjects (i.e., all subjects' surfaces have undergone surfacebased registration using Caret, Freesurfer, CIVET, or other software).
The goal is to find clusters (regions) that are statistically different between the groups of input data. That is, one can reject the null hypothesis which states that the metric/shape values at each node are essentially the same.
The steps in Caret are:
 Run the input files through an inferential statistical test to produce the statistic file and the randomized statistic file.
 Perform a significance test to assign PValues to the statistic file.
Each of the inferential tests in Caret produces two files. The statistic file contains the results of the statistical test performed on the input data. The randomized statistic file contains columns with the same statistical test performed on randomly assigned groups of the input data. This randomized file is used during significance testing.
One Sample TTest
inferentialttestonesample
Paired TTest
inferentialttestpaired
TwoSample TTest
inferentialttesttwosample Two sample TTest with or without pooled variance.
Interhemispheric Clusters
inferentialinterhemispheric
The interhemispheric clusters test is used to determine asymmetry (and symmetry) between the left and right hemispheres of two groups of subjects. All subjects left and right hemispheres must be coregistered to an atlas, typically the PALS atlas.
Inputs:
 AL is group A, left hemispheres.
 AR is group A, right hemispheres.
 BL is group B, left hemispheres.
 BR is group B, right hemispheres.
 ITER_LEFT_RIGHT is the number of iterations for TStatistics of random combinations of left or right subjects.
 ITERATIONS is the number of iterations for the randomized TStatistic file.
Algorithm:
 Create TL, a TStatistic metric file comparing the left hemispheres of the two groups, TL = TStatistic(AL, BL).
 Create TR, a TStatistic metric file comparing the right hemispheres of the two groups, TR = TStatistic(AR, BR).
 Create TP, a metric file containing the product of the left and right TStatistic, TP = TL * TR.
 Create RANDTL, a metric file containing TStatistics for ITER_LEFT_RIGHT randomized combinations of the left hemispheres from both groups, RANDTL = TStatistic(RandomCombinations(AL, BL)).
 Create RANDTR, a metric file containing TStatistics for ITER_LEFT_RIGHT randomized combinations of right hemispheres from both groups RANDTR = TStatistic(RandomCombinations(AR,BR)).
 Create RANDTP, a metric file containing ITERATIONS random combinations of the product of one column from each of the left and right TStatistic randomized files, RANDTP = RandomColumn(RANDTL) * RandomColumn(RANDTR).
Output:
 TP is the statistic file for input to the significance testing command.
 RANDTP is the randomized statistic file for input to the significance testing command.
Coordinate Difference Analysis of Variance
In coordinate difference analysis of variance, the input data are coordinates files from participants that are in two or more groups. In the ANOVA equations shown previously, X_{i} in the case of coordinate difference ANOVA, is a threedimensional coordinate. A subtraction operation, such as is the Euclidean (straight line) distance between two coordinates.
In the numerator of the FStatistic is . In the parentheses is the distance between a group average coordinate and the population average coordinate (the average of all coordinates). If the participants are all from the same population, each of the group average coordinates will be very close to the population average coordinate and this quantity will be small. If participants are from different populations, the group average coordinates will be different than the population average coordinate and this quantity will be large.
In the denominator of the FStatistic is . In the parenthesis is the distance between the coordinate of each participant in the group and the average coordinate for the group. When the participants in a group are spatially clustered this quantity will be small. When the participants in a group are spatially separated, this quantity will be large.
Consider the twodimensional examples below. In each example, there are two groups of data with each participant labels as "O" and "+". The average coordinate for each group is "(O)" and "(+)" with the population average coordinate at "(A)".
In the plot below, both groups appear to be from the same population. As a result, SS_{Treatment} will be small, resulting in a small FStatistic and one is unable to reject the null hypothesis.
In the plot below, the average coordinates of the two groups are spatially separated resulting in SS_{Treatment} being large. In addition, the groups are spatially clustered resulting in SS_{Error} being small. As a result, the numerator is large and the denominator small creating a large FStatistic and the rejection of the null hypothesis.
Coordinate Difference
NOTE: At this time, coordinate difference is not implemented in caret_stats.
Definitions:
 N_{x} is the number of participants in group X.
 D(i,j) = (The Euclidean distance between two threedimensional points.)
 AVG_{xj} is the average coordinate at node j for group x.
 X_{dev} = , where N_{x} is the number of participants in group X and M is the number of nodes.
Algorithm:
 Create A_{avg}, the average coordinate file for group A.
 Create B_{avg}, the average coordinate file for group B.
 Create A_{dev}, the deviations at each node for group A.
 Create B_{dev}, the deviations at each node for group B.
 If the mode is COORD_DIFF, create the statisticfile where the statistic at each node is D(A_{avg},B_{avg}).
 If the mode is TMAP_DIFF, create the statisticfile where the statistic at each node is
 Create the randomizedstatisticfile file. For each column in it, create two coordinate files that are randomized combinations from all of the input coordinate files on which the COORD_DIFF or TMAP_DIFF test is performed.
What Donna desires and matches the formula for an Unpooled TwoSample TTest
Significance Testing
Significance testing in Caret is a nonparametric technique involving randomization (bootstrapping???).
Two data files are required for significance testing. The first is the file containing the test statistic. The second file is the "randomized statistic" file that contains test statistics from many random combinations of the test subjects.
Randomization
Randomization testing is used to determine the PValues.
Randomization With One Group of Subjects
When there is one group of subjects, such as in a onesample TTest, it is not possible to randomize among groups. So, the randomization is performed by randomly flipping the signs of the values for each subject. The statistical test is then run on each of these randomizations and the largest clusters are identified.
Randomization With Multiple Groups of Subjects
With multiple groups of subjects, all of subjects are placed into a pool. Subjects are then randomly drawn from the pool and placed into new groups. The new groups contain the same number of subjects as the original groups. When randomizing subjects, each new randomization of subjects should be unique when compared to any previously generated groups of subjects. Statistical tests are then run on each of these randomizations and the largest clusters are identified.
Given a group of three subjects, choosing two at a time, there are 3 combinations and 6 permutations. For example, selecting two subjects from {A,B,C} results in the combinations {A,B}, {A,C}, and {B,C} and results in the permutations {A,B}, {A,C}, {B,C}, {B,A}, {C,A}, and {C,B}. Basically, with combinations, two groups of elements are equal if they contain the same elements, in any order (ie: {A,B}, and {B,A} are equivalent). With permutations, two groups of elements are equal only if they contain the same elements in an identical order (ie: {A,B} and {B,A} are NOT equivalent).
Mathematical formulas for the number of permutations and combinations when choosing k elements from a total of n elements:
P(n,k) =
C(n,k) =
P's and Q's
The significance tests in Caret produce both P and Q values. Q is simply 1  P. Q is useful for thresholding in Caret. One selects the statistic for viewing and thresholds with Q. Since Caret thresholds by inhibiting the display of data BELOW the threshold, one can threshold with Q and set the threshold to 0.95 to see statistics with a PValue of 0.05 or less.
Cluster Based Thresholding
For clusterbased threshold significance testing use "caret_stats significanceclusterthreshold".
 The user provides positive and negative thresholds and a desired significance level (PValue, eg: 0.05).
 Clusters of nodes passing the threshold tests are identified in the statistic file. Note that positive and negative values are processed separately.
 The largest cluster is identified in each column of the randomized statistic file using the thresholds.
 The clusters identified from the randomized statistic file are ranked based upon surface area (possibly corrected for surface distortion).
 The user provided PValue is multiplied by the number of columns in the randomized statistic file (eg: 0.05 * 500 = 25) providing the significant cluster rank. The cluster at this rank is identified and its surface area is noted as the "significant surface area".
 For each cluster in the statistic file, use its surface area and determine how it ranks in the ranked randomized clusters. Set the PValue for the statistic file's cluster to its ranking divided by the total number of columns in the randomized file. For example if the statistic cluster is ranked 3 out of 100, the cluster receives a PValue of 0.03.
The difficult part of clusterbased thresholding is selecting the thresholds. There is no "correct" threshold value. In general, smaller thresholds result in either or both more clusters and larger clusters and larger thresholds result in either or both fewer clusters and smaller clusters.
ThresholdFree Cluster Enhancement (TFCE)
For thresholdfree cluster enhancement significance testing use "caret_stats significancethresholdfree".
The difficulty of selecting a threshold in clusterbased thresholding led to the development of thresholdfree cluster enhancement (See Smith and Nichols in the References section at the bottom of this page). With thresholdfree cluster enhancement, the user does not need to choose thresholds.
 Apply the TFCE transform to the statistic in the statistic file.
 Apply the TFCE transform to all columns in the randomized statistic file.
 Find the largest TFCE value in each column of the TFCE transformed randomized statistic file and rank them.
 The user provided PValue is multiplied by the number of columns in the randomized statistic file (eg: 0.05 * 500 = 25) providing the significant TFCE rank. The TFCE at this rank is identified and its value is noted as the "significant TFCE value".
 For each node in the statistic file, use its TFCE value and determine how it ranks in the ranked, randomized TFCE values. Set the PValue for the statistic file's node to its ranking divided by the total number of columns in the randomized file. For example if the statistic node TFCE is ranked 3 out of 100, the node receives a PValue of 0.03.
TFCE(j) = where e(h) is the spatial extent of a node (in Caret, the node's surface area) and h is the value at the node. E and H are constants (0.5 and 2.0) . N total number of nodes contributing to this node's spatial extent (including the node itself). The spatial extent is all connected nodes that have a nonzero value with the same sign as the node being evaluated. In addition, as one moves away from the node for which the TFCE score is being calculated, the metric values must be adjusted so that they are no larger than the immediate neighbors that are closer to the node being evaluated (That is, as one moves away from the node being processed, the metric values must never increase, see Figure 1 in the TFCE paper which shows that we are calculating the gray region).
In the figure above, four nodes are labeled A, B, C, and D. If calculating the TFCE score for node C, its supporting section includes nodes A, B, and D. Below each node are boxes labeled a, b, c, and d that correspond to the nodes labeled with uppercase letters. For node C, the four boxes reprsent its supporting sections. The extent, e, is the horizontal size of the box and in Caret is the surface area (in millimeters) associated with the node. The height, h, is the vertical size of the box and is the statistical value associated with the node. Furthermore, notice that the box a, for node A, is limited in size vertically to the height of B, a local maximum. As one moves away from node C, the heights (statistical values) are limited so that they never increase.
Flat Surface with ZCoordinate set to TStatistic
Flat Surface with ZCoordinate set to TFCEEnhanced TStatistic
The significance testing command have a parameter named "numberofthreads". Threads allow a task to be broken down into pieces that may be run in parallel and take advantage of either multiple processors or multicore processors. Using threads will typically reduce the execution time of the command.
References
Books
 Howell, David C. (2002) Statistical Methods for Psychology. Pacific Grove, CA: Duxbury.
Journal Articles
 Nonparametric Permutation Test For Functional Neuroimaing: A Primer with Examples. Thomas E. Nichols and Andrew P. Holmes. Human Brain Mapping 15:1
 ThresholdFree Cluster Enhancement: Addressing problems of smoothing, threshold dependence and localisation in cluster inference. Stephen M. Smith and Thomas E. Nichols.NeuroImage 2009 44(1)
Web Sites
Glossary
 Clusterbased Thresholding  Groups of connected nodes with attribute values greater than a threshold (ie: t > 3.0) are identified.
 Family Wise Error  Probability of making Type I Errors (rejecting the null hypothesis when the null hypothesis is true). Also called alpha error.
 Gaussian Field Theory
 Nonparametric Statistics  The test contains no requirement that the data fit a probability distribution.
 Permutation Testing  A type of nonparametric test. http://en.wikipedia.org/wiki/Resampling_(statistics)
 Parametric Statistics  The test requires the data to fit a probability distribution, typically the normal distribution.
 ROC (Receiver Operating Characteristic) A plot that shows the tradeoff of true positive and false positive as the threshold is varied.
 Spatial Smoothing
 Supporting Section  The connected region contributing to a node's TFCE enhanced value.