Perform a Variance Stabilizing Transformation (VST) of a matrix of count data.
Usage
vst(
x,
method = "anscombe.nb",
lib.size = NULL,
cpm = FALSE,
dispersion = NULL,
design = NULL
)
Arguments
- x
A matrix of counts. Rows are genes (or other features), and columns are samples.
- method
VST to use, see details.
- lib.size
Optional, estimated if not given.
- cpm
Should the output be in log2 Counts Per Million, rather than simply log2.
- dispersion
Optional, estimated if not given. Dispersion parameter of the negative binomial distribution of the data.
- design
Optional. If dispersion isn't given, a design matrix to use when estimating dispersion.
Details
Several methods are available. "anscombe.nb" is recommended.
Methods:
"anscombe.nb": Default, asinh(sqrt((x+3/8)/(1/dispersion-3/4))). Anscombe's VST for the negative binomial distribution.
"anscombe.nb.simple": log(x+0.5/dispersion), a simplified VST also given by Anscombe.
"anscombe.poisson": sqrt(x+3/8). Anscombe's VST for the Poisson distribution. Only appropriate if you know there is no biological noise.
"naive.nb": asinh(sqrt(x/dispersion)). Resultant variance is slightly inflated at low counts.
"naive.poisson": sqrt(x). Resultant variance is slightly inflated at low counts.
Dispersion:
edgeR's estimate of the common dispersion of the count matrix would be a reasonable choice of dispersion. However Poisson noise in RNA-Seq data may be over-dispersed, in which case a slightly smaller dispersion may work better. I recommend not providing a dispersion and letting varistran pick an appropriate value.
If "dispersion" is not given, it is chosen so as to minimize sd(residual s.d.)/mean(residual s.d.). Residuals are calculated from the linear model specified by the parameter "design".
If "design" also isn't given, a linear model containing only an intercept term is used. This may lead to an over-estimate of the dispersion, so do give a design if possible.
References
Anscombe, F.J. (1948) "The transformation of Poisson, binomial, and negative-binomial data", Biometrika 35 (3-4): 246-254
Examples
# Generate some random data.
means <- runif(100,min=0,max=1000)
counts <- matrix(rnbinom(1000, size=1/0.01, mu=rep(means,10)), ncol=10)
y <- varistran::vst(counts)
#> Dispersion estimated as 0.008229806
# Information about the transformation
varistran::vst_advice(y)
#> count transformed_count twofold_step
#> 1 0 5.076701 NA
#> 2 1 5.223302 NA
#> 3 2 5.319325 0.09602318
#> 4 4 5.461933 0.14260809
#> 5 8 5.667338 0.20540513
#> 6 16 5.955851 0.28851281
#> 7 32 6.350196 0.39434499
#> 8 64 6.869899 0.51970332
#> 9 128 7.521960 0.65206102
#> 10 256 8.293935 0.77197446
#> 11 512 9.157749 0.86381380
#> 12 1024 10.082034 0.92428579
#> 13 2048 11.041880 0.95984538
#> 14 4096 12.021166 0.97928598