Diagrams of classical statistical procedures


No matter what the truth may be, classical statistical procedures only ever reject it with some small specified probability α. This all-possible-worlds counterfactualism requires a kind of thinking filled with 90-degree turns, and it does my head in. Never fear, however, for I have devised the diagrams above to make what is going on clear. While looking at these diagrams I experience the sensation of understanding. I hope to now share this sensation with you.

We shall be concerned with estimating a single parameter, or "effect". There is a true effect, and we have an estimate of it, but we know the estimate is inaccurate and have a probabilistic model describing exactly how inaccurate it will be, P(estimate|true). Here we will assume it has added t-distributed noise with zero mean and known scale.

Consider a diagram with the true value of a parameter as the x-axis, and the estimate as the y-axis. We will color in all points that are non-rejected.

Such a diagram represents a valid procedure if, for each true value, the non-rejected region will contain the estimate with probability 1-α. So in assessing validity we look at each vertical slice of the diagram.

To apply the diagram, given an estimated effect, we look at the corresponding horizontal slice of the diagram, and obtain a set of non-rejected true effect sizes.

Confidence intervals, and lower and upper confidence bounds

The first diagram shows the smallest possible non-rejection region. This is the diagram for computing confidence intervals. Looking at each vertical slice, the non-rejection region is centered on the point where the estimate is equal to the true value. It covers the densest region, so it can be quite compact and still contain the estimate with probability 1-α. Now looking at each horizontal slice, we see that we will always obtain non-rejection regions centered on the estimate.

The second diagram shows a procedure for obtaining a lower confidence bound on the effect. Looking at each vertical slice, the non-rejection region is no longer centered -- it goes down to negative infinity. However this means the top of the non-rejection region can be moved down slightly compared to the confidence interval diagram and still contain the estimate with probability 1-α. Now looking at each horizontal slice, we see that we will obtain a non-rejection region from slightly below the estimate up to positive infinity, so this diagram is for giving lower bounds on the true effect. It gives a slightly tighter lower bound than the confidence region diagram.

In a similar way we may obtain an upper confidence bound. (However it would not be valid to apply both the lower bound and upper bound procedure to the same data -- in doing so we would risk rejecting the truth with probability 2α. So in this case we would need to use the confidence interval procedure.)

t-tests to determine the sign of an effect

The t-test only tests whether a true effect of zero can be rejected. Having performed this test, what can we say about other effect values?

We have only tested our estimate against the boundaries at a true effect of zero, so we have only compared our estimate to the boundaries in this vertical slice. This splits the confidence interval diagram into three layers, and the confidence bound diagrams into two layers each. Looking at vertical segments within these layers, if any point is non-rejected all must be non-rejected. Filling out the confidence interval diagram and bound diagrams in this way gives us the diagrams for the two-sided and one-sided t-tests.

Examining horizontal lines through these diagrams, possible outcomes are that non-rejected effects are restricted to positive values, or to negative values, or that no rejection occurs. So when the two-sided t-test rejects a true effect of zero it also tells us the sign of the effect.

Similar to the lower and upper bounds, one-sided t-tests need a smaller estimate than the two-sided test to determine if the effect has a certain sign, at the cost of not testing the opposite sign.

From these diagrams we can see that confidence intervals and bounds tell us more than the corresponding t-test. One small virtue of the t-test is that when it is reported a p-value for the test can be quoted, allowing the reader to set their own α and rejecting an effect of zero if p ≤ α. Confidence intervals and bounds need α to be specified before performing the procedure.


The TREAT procedure (McCarthy and Smyth, 2009) represents a third kind of statistical procedure, a blend between the confidence interval and lower and upper bounds. The authors apply this to microarray and RNA-Seq data analysis, and it has implementations in their limma and edgeR packages. However there is nothing stopping it from being used more generally.

As described in the paper, the TREAT procedure calculates a test statistic and from this a p-value. Translating this into diagram-form: For each true effect size, a non-rejection interval centered on zero is found. For effect sizes close to zero, this interval is similar to our confidence interval, but further from zero it resembles an upper or lower bound.

The acceptance intervals are not centered on the true effect as in confidence intervals, but are also not infinitely off-center as in the upper and lower bounds.

Looking at horizontal slices through this diagram, we see that absolute effect sizes smaller than a certain amount will be rejected. This is what TREAT does: shows that the effect size is larger than a specified amount.

(As TREAT is implemented in limma and edgeR, one specifies a minimum effect size and obtains a p-value. Similar to our conversion from confidence interval to t-test, this would give us a squared up H-shaped region.)

Similarly to the t-test, having obtained statistical significance by TREAT, can we say anything about the sign of the effect? It seems not. There is no horizontal line which entirely rejects one or other sign. However it would be possible to fix this. This is the final diagram, my proposal for a modified TREAT, in which you learn not only that the absolute effect size is larger than some amount, but also whether it is positive or negative. Looking vertically, the non-rejection intervals are about half as far off-center. Looking horizontally, we are now able to determine the sign.

The plot below shows this with t-distributed errors (df=5) and α=0.05. The diagonal lines show the boundaries for confidence intervals, lower and upper bounds, and the central line of estimate=true. You can see there is only a tiny loss of power from this modification. (One would not reasonably attempt statistical analysis with a df smaller than this. For higher df the difference becomes even smaller, but does not entirely disappear.)


I hope these diagrams have given you a clearer understanding of some commonly used classical statistical procedures. They've certainly been necessary for me in order to think clearly about the TREAT procedure.


McCarthy, D. J., and Smyth, G. K. (2009). Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics, 25(6), 765-771.