Fisher's exact test is a statistical test used to determine if there are nonrandom associations between two categorical variables.
Let there exist two such variables and , with and observed states, respectively. Now form an matrix in which the entries represent the number of observations in which and . Calculate the row and column sums and , respectively, and the total sum
(1)
|
of the matrix. Then calculate the conditional probability of getting the actual matrix given the particular row and column sums, given by
(2)
|
which is a multivariate generalization of the hypergeometric probability function. Now find all possible matrices of nonnegative integers consistent with the row and column sums and . For each one, calculate the associated conditional probability using (2), where the sum of these probabilities must be 1.
To compute the P-value of the test, the tables must then be ordered by some criterion that measures dependence, and those tables that represent equal or greater deviation from independence than the observed table are the ones whose probabilities are added together. There are a variety of criteria that can be used to measure dependence. In the case, which is the one Fisher looked at when he developed the exact test, either the Pearson chi-square or the difference in proportions (which are equivalent) is typically used. Other measures of association, such as the likelihood-ratio-test, -squared, or any of the other measures typically used for association in contingency tables, can also be used.
The test is most commonly applied to matrices, and is computationally unwieldy for large or . For tables larger than , the difference in proportion can no longer be used, but the other measures mentioned above remain applicable (and in practice, the Pearson statistic is most often used to order the tables). In the case of the matrix, the P-value of the test can be simply computed by the sum of all -values which are .
For an example application of the test, let be a journal, say either Mathematics Magazine or Science, and let be the number of articles on the topics of mathematics and biology appearing in a given issue of one of these journals. If Mathematics Magazine has five articles on math and one on biology, and Science has none on math and four on biology, then the relevant matrix would be
(3)
|
Computing gives
(4)
|
and the other possible matrices and their s are
(5)
| |||
(6)
| |||
(7)
| |||
(8)
|
which indeed sum to 1, as required. The sum of -values less than or equal to is then 0.0476 which, because it is less than 0.05, is significant. Therefore, in this case, there would be a statistically significant association between the journal and type of article appearing.