Student's t-test
student - compute probabilities of equal means
SYNOPSIS
student [arg...]
MOTIVATION
You have worked very hard to get numbers you are proud of, but
then your referee or advisor keeps asking,
"Are these results statistically significant?"
This program will help you answer them.
The output tables can be generated in LaTeX or HTML code.
In the typical situation you have a set of performance
numbers from running your new whiz-bang technique multiple times, and a
second set from running some "standard" technique. You have taken averages
of both of these sets of numbers (hopefully yours is better), but there
is some chance that the difference in the averages is due to chance. Your
interlocutor wants to know if the difference is real or happenstance. (You
probably do, too.) This program will give you an answer of the
form,
There is an x% chance that these two datasets really were drawn from
populations with equal means.
If x is small (say, 5% or less) then the difference
in averages is unlikely to be due solely to chance, so we say the difference
is statistically significant.
Student's t-test,
developed by Gossett.
is the most widely used statistical test of all time.
Because the t-test relies on some normality assumptions,
this application also provides an alternative computation of confidence
intervals and significance via resampling, needed as discussed below.
An additional convenience is that the output can be generated as LaTeX
or HTML directives for building a table.
LICENSE
CONTENTS
DESCRIPTION
This program computes the likelihood that two different sets of samples
are actually drawn from populations with equal true means. Usually, each
set of samples is the measurements found in a single experiment. We want
to know whether the difference in the observed means is real or due to
chance. We use statistical techniques due to Student
to estimate the probability that the differences are due to chance. One
typically takes a low value of that probability to imply that the differences
are significant. Therefore, whatever was changed between the two experiments
is likely to be the cause.
You need to make three decisions about which variety of t-test to
perform.
- Directional or non-directional?
Does your hypothesis hold that one of the means should be greater than
the other?
A non-directional hypothesis merely holds that the means of the two
populations are different, but does not predict in which direction.
It is most often the case in our kind of research that one of the
methods supposedly improves on the other one, which is a directional
hypothesis.
A non-directional hypothesis requires a two-tailed T-test.
Note that if both types of tests are computed on the same data, the
one-tailed result will be a smaller probability.
(Assuming the means are in the right direction!)
The default in this program is to perform a one-tailed test.
- Independent or paired samples?
Independent samples means that the individual measurements in one group have
no particular relationship to measurements in the other group.
If, however, each measurement in one group is intrinsically linked to
a specific measurement in the other group, you have correlated
samples, and the t-test can be more powerful.
To do this, use the -M command line option.
Obviously, there must be exactly as many measurements in
one dataset as in the other.
The measured pairs must be in order in each dataset.
In essence, the t-test will be performed on the set of differences
(X_i - Y_i), and the null hypothesis is that the difference is zero.
A classic example is height measurements of a group of people with
and without shoes on. It is clear a priori that a systematic
difference exists, but an un-correlated t-test will report no
significant difference between the two sets of measurements because
the natural variance in height among people swamps out the smaller
difference due to shoes.
The paired sample t-test correctly identifies the systematic difference.
For this to be applicable, the pairs must be truly matched.
Typically, you have measured the performance of your method and the
standard method on identical test cases, so each test case gives you a
pair of measurements.
However, if your problem has any random elements, you must make a
judgment about whether the pairs really match.
- Do the populations have equal variances?
When the populations actually have equal variances, the variances of
each sample can be pooled.
The usual assumption is that the population variances are similiar
enough that pooling the sample variances is sensible.
The program will perform the Fisher F-test to check for dissimiliar
variance, but it is not fool-proof.
The normal output contains the "pooled variance" T-statistic, which is
usually sufficient if the variances are not too dissimilar.
In actual fact, if the variances of two samples are wildly different
the distribution curves may have completely different shapes, and so
comparing the difference in averages may be uninformative.
For normal use, this program computes probabilities based on the
one-tailed T-statistic for populations with equal variances.
If more than two experiments are supplied in the input, the output will
be a matrix containing a probability for each possible pair of
experiments.
If the -T option is used, the output is LaTeX directives to
build the table, which is handy for inserting into papers. HTML can
also be obtained.
Like all parametric statistical procedures, the t-test rests on
certain assumptions about the data.
Fortunately for us, alternative procedures based on resampling are
available, and are valid even when the t-test is not.
You will need to use resampling instead of
the t-test in the following cases:
- Some of your measurements correspond to failed runs.
- Your data are strongly non-normal.
- The scale of your measurements is not an equal interval scale.
An equal interval scale is one where ratios of the values make sense.
The t-test uses ratios. Indeed, a standard Z-score also uses ratios
(of a measurement to the standard deviation).
For example, the logarithm of resources required is not an equal
interval scale.
Even a simple average is of dubious utility if the values are actually
logarithms.
- In general, be very careful about what you are measuring if your
data span several orders of magnitude, because an arithmetic average
will then likely be up in the 90-th percentile of your data.
That is too skewed for the normal distribution to make any sense.
For example, one standard deviation above and below the mean includes
about 68% of the data in a normal distribution.
See resample.html for further details.
The directives shown below permit you to select
Student's t, or resampling procedures, or both.
INPUT FILE FORMAT
There are three fundamentally different input formats:
-
Normal input files consist of a series of
directives, as shown below.
Data can be interspersed with the directives, or in separate files.
A file can contain multiple datasets with explicit names and comments.
-
Undecorated data is requested by the -A option. Each file
consists merely of a set of observations, one per line, until the end
of file. Dataset names are taken as the file name.
-
To support existing users, the previous format of input files is also
supported. Data is preceded by a header line like this:
.student
nn datasetName comment-to-end-of-line
Note the period in column one.
nn is the count of data lines which follow the
header. Each input file consists of one or more datasets.
Each experiment begins with a header line, followed by data lines.
Version 1 files should be converted to the new format in order to take
advantage of the new directives.
DIRECTIVES
Normal input files consist of a list of directives, chosen from the
following list. Upper and lower case are equivalent.
The LOAD DATA operation permits data to be interspersed
with directives.
-
LOAD DATA options dataSetName;
LOAD FILE options fileName;
-
dataSetName and fileName must be contained in quote
marks if they contain anything other than numbers, letters and
underscores.
-
DATA means the actual data values follow immediately in
this same file, terminated by a line consisting of "/eof"
starting in column 1.
-
FILE means the actual data values are in the file named.
-
In either case, data lines contain columns of numbers separated by
whitespace. A line that begins with a sharp (#)
is a comment, and is ignored.
The column specified by the -c command line option
contains the observations of interest. For example, your results file might
have one column for best performance and another for population average
performance.
-
Note the semicolon following the name at the end of the directive.
- The available LOAD options include:
-
varLabel YES | NO
True if first data line contains column labels.
Default false.
-
caseLabel number | NONE
Ignore this option.
-
PASS number
Gives the column number of the pass/fail indicator, just like the
-p command line option.
Default is no pass/fail indicator.
-
SKIP numbers
Specifies columns to be ignored, separated by commas.
A range can be given as two numbers separated by a colon.
-
TITLE "Short phrase describing this dataset"
-
The following directives cause the specified information to be calculated and
displayed for the loaded data sets.
- CONFIDE percent
Compute confidence interval by resampling.
- DESCRIBE
Show summary statistics, just like -s command line option.
- ITERATE number
Specify iteration count for resampling.
- STUDENT
Compute probability of null hypothesis via Student's t-test.
- RESAMP
Compute probability of null hypothesis via resampling.
- For your convenience, a line containing a slash in column one is
copied to the output file, but otherwise treated as a comment.
The sharp character (#), means the rest
of the line is a comment.
COMMAND LINE OPTIONS
Command line arguments that do not begin with a dash or equal sign are
input file names.
An argument that begins with a dash is an option specifier, chosen
from [Acdegmprstv?].
An argument that consists of an equal sign causes the program to
perform some computational checks.
All options are digested before any input files are processed.
Command line options
| - |
A single dash means to read stdin as an input file. |
| -A |
Specify undecorated input files. All input files contain data
only, not directives. The default action is to perform a t-test, or
resampling, according to the -r option. |
| -cN |
Specify that column N contains the observations of interest. |
| -d |
No detail, print as table even when there are only 2
datasets. |
| -e |
Skip the F-test of equal variances. |
| -g |
Perform a two-tailed test. Default is to perform a directional (1-tailed)
significance test. |
| -m |
Do a matched pairs test. Default is to treat the observations as
independent |
| -pN |
Specify that column N has the pass/fail indicator. Default assumes
that all runs succeed. |
| -r |
Use resampling instead of the t-test. |
| -s |
Show summary statistics for each dataset. |
| -trefName |
Specify LaTeX output, and the table name. |
| -t+refName |
A table name that begins with a plus sign means that HTML
output will be generated. |
| -v |
Print some debugging information (usually useless). |
| -? |
Print a usage message on standard output and exit
successfully. |
| = |
A single equal sign argument causes sanity check on computations
to be performed. For debugging, or checking a port. |
When student is invoked with no filename arguments it reads
directives from standard input.
EXAMPLES
To illustrate the workings of the program, you may try student
with these examples:
MESSAGES
Warn: No datasets.
Warn: Single dataset. |
A t-test is senseless without
at least two datasets. Check that file names you specified are correct. |
| Warn: None okay. |
All of the samples in this
dataset were marked as FAILURE, so the mean resource per success is
not computable. |
| Err: No data found. |
Dataset is empty.
Check the file names you specified. |
| Err: Unmatched pairs. |
Datasets are of different
sizes, and you requested a matched pairs test. |
| Err: No t-test with fail cases. |
You specified a pass/fail
column.
The standard t-test has no sensible treatment for failed runs.
You should use resampling.
Check the file names you specified. |
| Err: All data identical. |
The computed variance is
exactly zero, meaning that all data values are identical.
This is very unlikely for an interesting dataset. |
Err: BETAI: x domain
Err: BETAI: a,b domain
Err: BETACF: iteration limit
Err: log Gamma domain
Err: stir |
These are all internal errors that should never happen. Contact the
author
if you can reproduce one of them. |
BUGS
-
See GnuPlot for more input format ideas.
-
Another useful input format would be CSV= Comma Separated Values, such as is
exportable from many spreadsheets and databases.
-
Also allows fancy format numeric input, like Excel may generate.
Ignores leading dollar signs and embedded commas in numbers.
A single dash followed by a space or end of line is interpreted as a
zero.
-
There is no notation to input DONT-CARE values.
INSTALLATION
Grab the
C++ source code, then tar -xzf student-*.tgz.
Tested on Linux. Should work with DEV-C++ on Windows.
Try out some of the examples
to make sure that it is working correctly.
REFERENCES
- "Student" was the nom de plume of William
Sealy Gosset, the statistician who discovered the t-distributions
in 1908. The t distribution is thus called
"Student's t" in his honor.
Please note that this usage: "... use the students' t-test..."
is therefore uninformed and incorrect.
-
M. Galassi et al,
GNU Scientific Library Reference Manual (2nd Ed.),
ISBN 0954161734.
The calculations of the transcendental functions
(beta, gamma, etc.) needed for computing the t-test and other statistics
are provided by the
GNU Scientific Library, therefore this application inherits the
GNU General Public
License.
- The calculations of beta, gamma, etc, functions
are explained by Press, et. al.,
Numerical Recipes in C, a well-known
reference \cite{press:recipes92}, or on
the web at www.nr.com
- Statistical usage and some examples are derived from
Moore and McCabe, Introduction to the Practice of Statistics, a
standard textbook \cite{moore:stats93}.
-
Resampling Methods: A Practical Guide to Data Analysis
Good, Phillip I.
Introduces statistical methodology - estimation, hypothesis testing,
and classification - to a wide audience, simply and intuitively,
through resampling from the data at hand.
- www.resample.com has
commercial resampling software plus a wealth of references and background.
- See
Julian Simon for a thorough explanation of resampling.
-
Richard Lowry provides an excellent and up to date on-line course in
statistics at Vassar.
Back to UTCS Neural Networks User's Guide
Back to UTCS Neural Networks
home page
- Version 1.0, February 1998,
Paul McQuesten: inflexible and hard to use.
- Version 1.1, March 1999.
- Version 1.9, October 2002: Release resampling to NN Group.
- Version 2.0, March 2003: Provide paired and one-tailed tests.
- Version 2.1, April 2003: Support reading data files in version 1 format.
- Version 2.2, April 2005: Use GSL random generator.
Email: mcquesten@gmail.com.
Author home page:
http://bulldog2.redlands.edu/fac/paul_mcquesten.
Last update: $Id: ttest.html 2656 2006-08-27 00:54:58Z paulmcq $