Note on the Use and Misuse of Permutation Methods

Phillip I. Good, Ph.D., Huntington Beach CA 92648, USA.


The use of permutation methods in applied statistics is becoming increasingly widespread.  Alas, so is their misuse. The purpose of the present tutorial is to correct this.

      Permutation methods are employed for three reasons:

1.      They provide exact significance levels rather than approximations (for example, in the analysis of contingency tables, the permutation distribution of the chi-square statistic provides exact results, while the chi-square distribution can be a far-from-exact approximation).

2.      Their significance levels are distribution free.

3.      They yield more powerful statistics (for example, in the comparison of the means of k populations in an unbalanced design).

      The misuse of permutation methods (rearrangement methods, actually, as they entail rearrangements of labels rather than permutations) results from one of three failures:

1.      Failure to use the “best statistic” for an application.

2.      Failure to understand that the permutation methods are not “assumption free.”

3.      Failure to use the appropriate group of rearrangements.

We consider each of these failures in turn.


The best statistic

Theory tells us that the “best statistic” for an application is the one that will minimize either the expected risk or the maximum risk.  In applied terms this usually translates into using that statistic which will minimize the Type II errors against the alternatives of interest while guaranteeing Type I errors no larger than a predetermined value.

      Failure to use the best statistic is common whether parametric or permutation procedures are employed.  For years, researchers used LSD regression rather than LAD regression simply because the appropriate computer programs were not available. In doing so they ran the risk that a single misleading data point would skew the results.

      When comparing the means of k-populations one has a choice of any of the following statistics:

F1 =.

F2 =

Fd =


Which statistic one ought use clearly depends on the alternatives of interest and the potential losses.  The value of the permutation method is that one is free to use the “best statistic” for the application and is not limited by the availability of previously tabled values.

      The implication of this discussion for the teacher of statistics is that the Fundamental Lemma of Neyman and Pearson ought become a part of applied as well as theoretical courses in statistics. 



Permutation tests are not assumption free, though many an author in the social and biological sciences seems intent on this belief.   Permutation tests are exact only if the data points that are rearranged are exchangeable under the null hypothesis, that is, if the joint distribution of the observations remains unchanged under rearrangements of the data labels when the null hypothesis is true.  This implies that the observations viewed individually must be identically distributed. Thus, permutation tests are not applicable to the Behrens-Fisher problem.

      Though permutation tests are often described as distribution free, again this is true only under the null hypothesis.  The power of these tests and related optimality properties clearly depend upon the underlying data distributions.   For example, while the permutation test based upon the difference in sample means is most powerful among unbiased tests for testing the hypothesis F[x]=G[x] against shift alternatives of the form F[x]=F[x+δ], without further assumptions about the underlying distributions, we can say nothing about its power against alternatives outside of this class.


Group of Rearrangements

As first observed by Salmaso [2003], one cannot obtain an exact permutation test in the analysis of a two-way design by using the permutation distribution obtained from all possible rearrangements of the data labels.  Instead, attention must be restricted to the subgroups of synchronized permutations.

      In analyzing a balanced 2xC crossover design with C>2, Good and Xie report that a permutation test of the main effect of treatment based upon exchanges of the labels “control” and “treatment” would not be exact.  The proper set of exchanges is among the treatment sequences.


When the assumption of exchangeability is satisfied, the optimal statistic is employed, and the appropriate set of rearrangements is used, the powerful and exact permutation methods  provide a valuable addition to the applied statistician’s armory.


Salmaso, L. (2003). Synchronized permutation tests in 2k factorial designs. Communications in Statistics - Theory and Methods.  32, 1419-1438.