2020-03-10

Dissatisfaction with p-value usage in science

What goes wrong with p-values?

Selective reporting invalidates Type I error control.

Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8):e124.

  • Argues that amongst significant results reported, a majority are false.
    (Note: this is about Positive Predictive Value, PPV=1-FDR, not Type I error rate.)


People read more into a p-value than they should:

  • p-values are not an effect size.
  • Dichotomization (p<0.05?) leads to apprarent paradoxes.
  • “Insignificant” taken as evidence of no effect, but may be lack of power or bad luck.
  • “Significant” taken to mean large effect, but may be a powerful experiment or luck.

Confidence Intervals are better

  • Interval given in meaningful units, can judge importance.
  • Cause of dichotomization “paradoxes” are clear.
  • Rejects more hypotheses — reject everything outside the interval!

Confidence Intervals are better

Cumming, Geoff. (2012). Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Taylor & Francis Group, New York and London.

“New Statistics” approach:

  • Stop dichotomising.
  • Instead think in terms of measurement accuracy.
  • Report everything, combine using meta-analysis.

Confidence Intervals are better

Cumming, Geoff. (2012). Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Taylor & Francis Group, New York and London.

“New Statistics” approach:

  • Stop dichotomising.
  • Instead think in terms of measurement accuracy.
  • Report everything, combine using meta-analysis.


Good prescription for clinical trials or psychology experiments.

Not so great for bioinformatics.

  • We often consider thousands of p-values, selective reporting is the whole point.

False Discovery Rate

Benjamini, Y. and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289–300.

To select a set of hypotheses to reject, \(S\), from \(n\), with an FDR of \(q\),

choose the largest set satisfying:

\[ S = \left\{ i : p_i \leq {|S| \over n} q \right\} \]


Sets for smaller FDR nest within sets for larger FDR, so can be presented in a way that leaves choice of FDR to reader (idea attributed to Gordon Smyth):

  • Calculate an “adjusted p-value”.
  • Reader can read down a sorted list to desired cutoff value.

What about Confidence Intervals?

False Coverage-statement Rate

Benjamini, Y. and Yekutieli, D. (2005). False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters. Journal of the American Statistical Association, 100(469):71– 81.


After selecting a subset \(S\) out of \(n\) parameters, for an FCR of \(q\),

provide intervals with coverage probability \(1-{{|S| \over n} q}\).


Very broadly applicable:

  • Example of CIs reported in an abstract.


Caveat:

  • Point estimates remain biassed by selection.

RNA-seq example with many samples