Outline

These slides: https://tinyurl.com/y3nmue36


RNA-Seq as a typical bioinformatics data type

False Discovery Rates

The wider debate around p-values

False Coverage-statement Rates

topconfects package

RNA-Seq introduction

Biological samples containing mRNA molecules

  ↓ RNA to DNA reverse transcription

  ↓ Fragmentation

  ↓ High-throughput shotgun sequencing (Illumina)

Millions of short DNA sequences per sample, called “reads”

  ↓ “Align” reads to reference genome (approximate string search)

  ↓ Count number of reads associated with each gene

Matrix of read counts

RNA-Seq introduction

~20,000 genes.

Often only 2 or 3 biological samples per experimental group.


Which genes differ in expression level between two groups?

  • log transform read counts
  • perform 20,000 t-tests

RNA-Seq introduction

Typically done using limma Bioconductor package.

Experimental design may be complicated, so allow any linear model.
Many people’s first encounter with linear models!

limma’s novel feature:

  • genes have similar but not identical variability
  • use an “emprical Bayes” prior for the residual variance,
    as if we have some extra “prior” residual degrees of freedom
Smyth, G. K. (2004). Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. Statistical Applications in Genetics and Molecular Biology, 3(1):1–25.
(More advanced methods use negative-binomial GLM on counts directly: edgeR, DESeq2)

RNA-Seq introduction

We have ~20,000 p-values, one for each gene.

We want to select the significantly differentially expressed genes.

If we select genes with \(p \leq 0.05\), we will get ~1000 “discoveries” purely by chance.

False Discovery Rate (FDR)

Assume the set of true discoveries to be made is much smaller than \(n_\text{gene}\).

For p-value cutoff \(\alpha\) and total discoveries \(k\), the FDR \(q\) will be approximately

\[ q = { n_\text{gene}\alpha \over k } \]

False Discovery Rate (FDR)

\[ q = { n_\text{gene}\alpha \over k } \]

So to achieve a specified FDR \(q\), we need

\[ \alpha = { k \over n_\text{gene} } q \]

The larger the set of discoveries \(k\), the larger the \(\alpha\). Weirdly circular!

Greedily choose the largest \(\alpha\) possible.


“For whoever has, to him more will be given, and he will have abundance; but whoever does not have, even what he has will be taken away from him.”
Mathew 13:12

False Discovery Rate (FDR)

Benjamini, Y. and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289–300.


Benjamini and Hochberg proved this works, assuming the tests are independent or positively correlated.

In practice, provide “FDR-adjusted p-values” that let the reader select their desired FDR.

  • sort results by p-value
  • reader takes the head of the sorted table, down to their desired FDR

Path dependence

Example RNA-Seq volcano plot
Red points significant
at FDR 0.05

\(n_\text{sample}\)-poor, \(n_\text{gene}\)-rich situation, struggling to make any discoveries.

It has made sense to sort results by p-value or plot p-values.

Need distributional assumptions, rank-based methods can’t produce small enough p-values!

What if we do make a lot of discoveries?

  • Choose a smaller FDR
  • Ad-hoc combination of p-value and fold-change cutoff

Path dependence

Example GWAS Manhattan plot
(from https://en.wikipedia.org/wiki/Genome-wide_association_study)

Many ’omics datasets follow this \(n_\text{sample}\)-poor, \(n_\text{feature}\)-rich pattern:

  • microarrays
  • Genome Wide Association Study
    (which mutations cause a disease?)
  • protein mass-spectrometry
  • 16S sequencing
    (bacterial species ecology)
  • Bisulfite sequencing, Hi-C, ATAC-seq, ChIP-seq, …
    (DNA epigenetics, physical layout, openness to transcription, associated proteins)

Analysis almost always focussed on p-values.

The FDR of science

Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8):e124.

Ioannidis estimates that the Positive Predictive Value (PPV=1-FDR) of most fields is less then 50%.

  • Tests performed with \(\alpha=0.05\)
  • Mostly tests where pre-study odds of a true effect are low
  • Mostly under powered, false-negative probability \(\beta\) high

Biasses:

  • Hypotheses tested multiple times by different labs
  • Flexibility in analysis
  • Tests selectively reported

Ongoing controversy about p-values

Meanwhile,
“Scientists rise up against statistical significance”

800 scientists signed a statement published in Nature, March 2019

Mostly reasonable, but…

“How do statistics so often lead scientists to deny differences that those not educated in statistics can plainly see?”

“This is why we urge authors to discuss the point estimate, even when they have a large P value or a wide interval, as well as discussing the limits of that interval.”



Gelman broadly liked it, Ioannidis disliked the undercurrent

Comment: in bioinformatics the best looking noise can look very convincing.

Some specific problems with p-values

  • “Insignificant” result may owe to no effect, or a lack of power, or luck

  • “Significant” result may owe to a large effect, or a powerful experiment, or selective reporting, or luck

  • Dichotomization leads to apprarent paradoxes