weitrix_calibrate_all.Rd
This is a very flexible method of calibrating weights. It should be especially useful if your existing weights account for technical variation, but there is also biological variation. In this case large weights will tend to be overly optimistic, and a non-linear transformation of weights is needed.
weitrix_calibrate_all( weitrix, design = ~1, trend_formula = NULL, mu_min = NA, mu_max = NA, keep_fit = FALSE )
weitrix | A weitrix object, or an object that can be converted to a weitrix
with |
---|---|
design | A formula in terms of |
trend_formula | A formula specification for predicting squared residuals. See below. If absent, metadata(weitrix)$weitrix$calibrate_all_formula is used. |
mu_min | When fitting the GLM, omit observations where the estimated mu is less than this value. When calculating weights from the fitted GLM, clip mu to be at least this value. |
mu_max | When fitting the GLM, omit observations where the estimated mu is greater than this value. When calculating weights from the fitted GLM, clip mu to be at most this value. |
keep_fit | Keep glm fit and the data used to create it. This can be large!
If TRUE, these will be stored in |
A SummarizedExperiment object with metadata fields marking it as a weitrix.
metadata(weitrix)$weitrix
will contain the fitted trend model,
and if requested the data frame used to fit the model.
Residuals are found relative to a fitted model. A trend model is then fitted to the squared residuals using a gamma GLM with log link function. Weitrix weights are set based on the inverse of the fitted trend.
Residuals from a fitted model are generally smaller than residuals from the true model. A simple adjustment to the weights is made to account for this. Weights are reduced by a factor of (n-ncol(design)*nrow(weitrix))/n where n is the number of non-missing values in the weitrix.
trend_formula
may reference any row or column variables,
or mu
for the predicted value,
or weight
for the existing weights,
or special factors row
and col
.
Keep in mind also that a log link function is used.
Unlike in weitrix_calibrate_trend
,
existing weights must be explicitly included in the formula
if they are to be retained (see examples).
This function is currently not memory efficient,
it should be fine for bulk experiments but may struggle for single cell.
To reduce memory usage somewhat,
when constructing the data frame on which to fit the glm,
only columns referenced in trend_formula
are included.
Example formulas:
trend_formula=~1+offset(-log(weight))
Apply a global scaling, otherwise keeping weights the same.
trend_formula=~log(weight)
Moderate weights by raising them to some power
and applying some overall scaling factor.
This will allow for biological variation.
trend_formula=~poly(log(weight),2))
Apply a more complex quadratic curve-based moderation of weights.
trend_formula=~col+offset(-log(weight))
Calibrate each sample's weights by a scaling factor.
Note that due to the simplistic adjustment for using a fitted model rather
than the true model, this may give misleading results when
the design is unbalanced and there are few samples,
i.e. when there are some samples with much higher leverage than others.
trend_formula=~col*poly(log(weight),2)
Quadratic curve moderation of weights, applied to each sample individually.
simcal <- weitrix_calibrate_all(simwei, ~1, ~log(weight), keep_fit=TRUE) metadata(simcal)$weitrix$all_fit#> #> Call: glm2(formula = .y ~ log(weight), family = quasi(link = "log", #> variance = "mu^2"), data = data, mustart = rep(1.57903917221936, #> 33L), control = glm.control(maxit = 100), model = FALSE, #> x = FALSE, y = FALSE) #> #> Coefficients: #> (Intercept) log(weight) #> 1.293 -1.036 #> #> Degrees of Freedom: 32 Total (i.e. Null); 31 Residual #> Null Deviance: 111 #> Residual Deviance: 98.35 AIC: NA