Adjust weights element-wise by fitting a trend to squared residuals

This is a very flexible method of calibrating weights. It should be especially useful if your existing weights account for technical variation, but there is also biological variation. In this case large weights will tend to be overly optimistic, and a non-linear transformation of weights is needed.

weitrix_calibrate_all(
  weitrix,
  design = ~1,
  trend_formula = NULL,
  mu_min = NA,
  mu_max = NA,
  keep_fit = FALSE
)

Arguments

weitrix	A weitrix object, or an object that can be converted to a weitrix with `as_weitrix`.
design	A formula in terms of `colData(weitrix` or a design matrix, which will be fitted to the weitrix on each row. Can also be a pre-existing Components object, in which case the existing fits (`design$row`) are used.
trend_formula	A formula specification for predicting squared residuals. See below. If absent, metadata(weitrix)$weitrix$calibrate_all_formula is used.
mu_min	When fitting the GLM, omit observations where the estimated mu is less than this value. When calculating weights from the fitted GLM, clip mu to be at least this value.
mu_max	When fitting the GLM, omit observations where the estimated mu is greater than this value. When calculating weights from the fitted GLM, clip mu to be at most this value.
keep_fit	Keep glm fit and the data used to create it. This can be large! If TRUE, these will be stored in `metadata(weitrix)$weitrix$all_fit` and `metadata(weitrix)$weitrix$all_data`.

Value

A SummarizedExperiment object with metadata fields marking it as a weitrix.

metadata(weitrix)$weitrix will contain the fitted trend model, and if requested the data frame used to fit the model.

Details

Residuals are found relative to a fitted model. A trend model is then fitted to the squared residuals using a gamma GLM with log link function. Weitrix weights are set based on the inverse of the fitted trend.

Residuals from a fitted model are generally smaller than residuals from the true model. A simple adjustment to the weights is made to account for this. Weights are reduced by a factor of (n-ncol(design)*nrow(weitrix))/n where n is the number of non-missing values in the weitrix.

trend_formula may reference any row or column variables, or mu for the predicted value, or weight for the existing weights, or special factors row and col. Keep in mind also that a log link function is used.

Unlike in weitrix_calibrate_trend, existing weights must be explicitly included in the formula if they are to be retained (see examples).

This function is currently not memory efficient, it should be fine for bulk experiments but may struggle for single cell. To reduce memory usage somewhat, when constructing the data frame on which to fit the glm, only columns referenced in trend_formula are included.

Example formulas:

trend_formula=~1+offset(-log(weight)) Apply a global scaling, otherwise keeping weights the same.

trend_formula=~log(weight) Moderate weights by raising them to some power and applying some overall scaling factor. This will allow for biological variation.

trend_formula=~poly(log(weight),2)) Apply a more complex quadratic curve-based moderation of weights.

trend_formula=~col+offset(-log(weight)) Calibrate each sample's weights by a scaling factor. Note that due to the simplistic adjustment for using a fitted model rather than the true model, this may give misleading results when the design is unbalanced and there are few samples, i.e. when there are some samples with much higher leverage than others.

trend_formula=~col*poly(log(weight),2) Quadratic curve moderation of weights, applied to each sample individually.

Examples


simcal <- weitrix_calibrate_all(simwei, ~1, ~log(weight), keep_fit=TRUE)

metadata(simcal)$weitrix$all_fit
#> 
#> Call:  glm2(formula = .y ~ log(weight), family = quasi(link = "log", 
#>     variance = "mu^2"), data = data, mustart = rep(1.57903917221936, 
#>     33L), control = glm.control(maxit = 100), model = FALSE, 
#>     x = FALSE, y = FALSE)
#> 
#> Coefficients:
#> (Intercept)  log(weight)  
#>       1.293       -1.036  
#> 
#> Degrees of Freedom: 32 Total (i.e. Null);  31 Residual
#> Null Deviance:	    111 
#> Residual Deviance: 98.35 	AIC: NA