Variance Stabilizing Transformation

Perform a Variance Stabilizing Transformation (VST) of a matrix of count data.

Usage

vst(
  x,
  method = "anscombe.nb",
  lib.size = NULL,
  cpm = FALSE,
  dispersion = NULL,
  design = NULL
)

Arguments

x: A matrix of counts. Rows are genes (or other features), and columns are samples.
method: VST to use, see details.
lib.size: Optional, estimated if not given.
cpm: Should the output be in log2 Counts Per Million, rather than simply log2.
dispersion: Optional, estimated if not given. Dispersion parameter of the negative binomial distribution of the data.
design: Optional. If dispersion isn't given, a design matrix to use when estimating dispersion.

Value

A transformed matrix.

Details

Several methods are available. "anscombe.nb" is recommended.

Methods:

"anscombe.nb": Default, asinh(sqrt((x+3/8)/(1/dispersion-3/4))). Anscombe's VST for the negative binomial distribution.

"anscombe.nb.simple": log(x+0.5/dispersion), a simplified VST also given by Anscombe.

"anscombe.poisson": sqrt(x+3/8). Anscombe's VST for the Poisson distribution. Only appropriate if you know there is no biological noise.

"naive.nb": asinh(sqrt(x/dispersion)). Resultant variance is slightly inflated at low counts.

"naive.poisson": sqrt(x). Resultant variance is slightly inflated at low counts.

Dispersion:

edgeR's estimate of the common dispersion of the count matrix would be a reasonable choice of dispersion. However Poisson noise in RNA-Seq data may be over-dispersed, in which case a slightly smaller dispersion may work better. I recommend not providing a dispersion and letting varistran pick an appropriate value.

If "dispersion" is not given, it is chosen so as to minimize sd(residual s.d.)/mean(residual s.d.). Residuals are calculated from the linear model specified by the parameter "design".

If "design" also isn't given, a linear model containing only an intercept term is used. This may lead to an over-estimate of the dispersion, so do give a design if possible.

References

Anscombe, F.J. (1948) "The transformation of Poisson, binomial, and negative-binomial data", Biometrika 35 (3-4): 246-254

Author

Paul Harrison

Examples


# Generate some random data.
means <- runif(100,min=0,max=1000)
counts <- matrix(rnbinom(1000, size=1/0.01, mu=rep(means,10)), ncol=10)

y <- varistran::vst(counts)
#> Dispersion estimated as 0.008229806

# Information about the transformation
varistran::vst_advice(y)
#>    count transformed_count twofold_step
#> 1      0          5.076701           NA
#> 2      1          5.223302           NA
#> 3      2          5.319325   0.09602318
#> 4      4          5.461933   0.14260809
#> 5      8          5.667338   0.20540513
#> 6     16          5.955851   0.28851281
#> 7     32          6.350196   0.39434499
#> 8     64          6.869899   0.51970332
#> 9    128          7.521960   0.65206102
#> 10   256          8.293935   0.77197446
#> 11   512          9.157749   0.86381380
#> 12  1024         10.082034   0.92428579
#> 13  2048         11.041880   0.95984538
#> 14  4096         12.021166   0.97928598