These slides: https://tinyurl.com/y3nmue36
RNA-Seq as a typical bioinformatics data type
False Discovery Rates
The wider debate around p-values
False Coverage-statement Rates
topconfects package
These slides: https://tinyurl.com/y3nmue36
RNA-Seq as a typical bioinformatics data type
False Discovery Rates
The wider debate around p-values
False Coverage-statement Rates
topconfects package
Biological samples containing mRNA molecules
 ↓ RNA to DNA reverse transcription
 ↓ Fragmentation
 ↓ High-throughput shotgun sequencing (Illumina)
Millions of short DNA sequences per sample, called “reads”
 ↓ “Align” reads to reference genome (approximate string search)
 ↓ Count number of reads associated with each gene
Matrix of read counts
~20,000 genes.
Often only 2 or 3 biological samples per experimental group.
Which genes differ in expression level between two groups?
Typically done using limma
Bioconductor package.
Experimental design may be complicated, so allow any linear model.
Many people’s first encounter with linear models!
limma
’s novel feature:
edgeR
, DESeq2
)We have ~20,000 p-values, one for each gene.
We want to select the significantly differentially expressed genes.
If we select genes with \(p \leq 0.05\), we will get ~1000 “discoveries” purely by chance.
Assume the set of true discoveries to be made is much smaller than \(n_\text{gene}\).
For p-value cutoff \(\alpha\) and total discoveries \(k\), the FDR \(q\) will be approximately
\[ q = { n_\text{gene}\alpha \over k } \]
\[ q = { n_\text{gene}\alpha \over k } \]
So to achieve a specified FDR \(q\), we need
\[ \alpha = { k \over n_\text{gene} } q \]
The larger the set of discoveries \(k\), the larger the \(\alpha\). Weirdly circular!
Greedily choose the largest \(\alpha\) possible.
“For whoever has, to him more will be given, and he will have abundance; but whoever does not have, even what he has will be taken away from him.”
Mathew 13:12
Benjamini and Hochberg proved this works, assuming the tests are independent or positively correlated.
In practice, provide “FDR-adjusted p-values” that let the reader select their desired FDR.
Example RNA-Seq volcano plot
Red points significant
at FDR 0.05
\(n_\text{sample}\)-poor, \(n_\text{gene}\)-rich situation, struggling to make any discoveries.
It has made sense to sort results by p-value or plot p-values.
Need distributional assumptions, rank-based methods can’t produce small enough p-values!
What if we do make a lot of discoveries?
Many ’omics datasets follow this \(n_\text{sample}\)-poor, \(n_\text{feature}\)-rich pattern:
Analysis almost always focussed on p-values.
Ioannidis estimates that the Positive Predictive Value (PPV=1-FDR) of most fields is less then 50%.
Biasses:
American Statistical Association statement on p-values, March 2016
Special issue of the ASA’s journal, March 2019
800 scientists signed a statement published in Nature, March 2019
Mostly reasonable, but…
“How do statistics so often lead scientists to deny differences that those not educated in statistics can plainly see?”
“This is why we urge authors to discuss the point estimate, even when they have a large P value or a wide interval, as well as discussing the limits of that interval.”
Gelman broadly liked it, Ioannidis disliked the undercurrent
Comment: in bioinformatics the
best looking noise can look very convincing.
“Insignificant” result may owe to no effect, or a lack of power, or luck
“Significant” result may owe to a large effect, or a powerful experiment, or selective reporting, or luck
Dichotomization leads to apprarent paradoxes
ASA statement on p-values, March 2016
“4. Proper inference requires full reporting and transparency. P-values and related analyses should not be reported selectively.”
“5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.”
\(p \leq 0.05\) iff 95% Confidence Interval doesn’t pass through zero.
“New Statistics” approach:
But giving up
selection isn’t an option for us!
After selecting \(k\) of \(n_\text{gene}\) results,
for an FCR of \(q\),
provide \(1-{{k \over n_\text{gene}} q}\) confidence intervals.
Any selection rule may be used!
Caveat: point estimates remain biassed
I want to move us away from p-values and use confidence bounds.
Sometimes we are not \(n_\text{sample}\) poor. Sometimes we have too many discoveries!
I want to rank the results and let the reader select some to follow up.
TCGA breast cancer data.
97 RNA-seq tumor-normal pairs.
Shown is a random sample of 20 genes.
Highly heteroscedastic.
Focusses on high signal-noise ratio.
Top enriched gene-sets to do with cell division.
To follow up a reasonable number of genes, choose a tiny FDR?
Null hypothesis for each gene that absolute LFC is smaller than threshold \(e\)
But
TOP CONFident efFECT Sizes
An alternative presentation of TREAT.
Bonus: “confects” also work as FCR adjusted confidence bounds.
Let \(S(e)\) be the set of genes selected by TREAT at threshold \(e\), for some fixed FDR (eg 5%)
These sets nest:
If \(e_1 > e_2\) then \(S(e_1) \subseteq S(e_2)\)
Call the largest \(e\) for which a gene is a member of \(S(e)\) its “confect”, with appropriate sign
Top enriched gene-sets to do with blood vessels, cell-cell signalling, cell adhesion.
Sub-sample of 10 patients.
Confects much smaller than effects indicates lack of power.
# ... filter out low expression genes ... # ... transform to log2 Reads Per Million ... # Fit linear models with limma library(limma) fit <- lmFit(log2_rpm_matrix, design_matrix) # ... depending on design matrix, might also need contrasts.fit ... # Calculate confects library(topconfects) result <- limma_confects(fit, coef="treatmentTRUE", trend=TRUE)
edger_confects
and deseq2_confects
also available, but slower!
normal_confects
if you have a colleciton of estimates with standard errors.
FCR adjustment provides a very flexible fix for selective reporting of CIs.
Topconfects provides FCR control when selecting the head of a ranked list.
Topconfects tends to be conservative. Bayesian credible intervals method may be more efficient. If the prior distribution includes everything the selector knows, it will be immune from
selection.
FCR concept gives a way to test methods.
Methods we use are cultural, path dependent:
We may be overlooking useful \(n_\text{sample}\)-large methods:
Why have current methods worked?
We now have a new degree of freedom: what effect size?
RNA Systems Biology Lab
Traude Beilharz
Andrew Pattison
Monash Bioinformatics Platform
David Powell
\(df_\mathrm{within}=2\), \(s_\mathrm{within}=0.75\), \(df_\mathrm{between}=3\), \(s_\mathrm{between}=0.5\)