Find rows with confidently excessive variability in a calibrated weitrix

Find rows with confident excess standard deviation beyond what is expected based on the weights of a calibrated weitrix. This may be used, for example, to find potential marker genes.

weitrix_sd_confects(
  weitrix,
  design = ~1,
  fdr = 0.05,
  step = 0.001,
  assume_normal = TRUE
)

Arguments

weitrix	A weitrix object, or an object that can be converted to a weitrix with `as_weitrix`.
design	A formula in terms of `colData(weitrix` or a design matrix, which will be fitted to the weitrix on each row. Can also be a pre-existing Components object, in which case the existing fits (`design$row`) are used.
fdr	False Discovery Rate to control for.
step	Granularity of effect sizes to test.
assume_normal	Assume weighted residuals are normally distributed? Assumption of normality is quite a strong assemption here. If TRUE, tests are based on the weighted squared residuals following a chi-squared distribution. If FALSE, tests are based on assuming the dispersion follows an asymptotically normal distribution, with variance estimated from the weighted squared residuals. If FALSE, a reasonably large number of columns is required. Defaults to TRUE.

Value

A topconfects result. The $table data frame contains columns:

effect Estimated excess standard deviation, in the same units as the observations themselves. 0 if the dispersion is less than 1.
confect A lower confidence bound on effect.
row_mean Weighted mean of observations in this row.
typical_obs_err Typical accuracy of each observation.
dispersion Dispersion. Weighted sum of squared residuals divided by residual degrees of freedom.
n_present Number of observations with non-zero weight.
df Degrees of freedom. n minus the number of coefficients in the model.
fdr_zero FDR-adjusted p-value for the null hypothesis that effect is zero.

Note that dispersion = effect^2/typical_obs_err^2 + 1 for non-zero effect values.

Details

Important note: With the default setting of assume_normal=TRUE, the "confect" values produced by this method are only valid if the weighted residuals are close to normally distributed. If you have a reasonably large number of columns (eg single cell data), you can and should relax this assumption by specifying assume_normal=FALSE.

This is a conversion of the "dispersion" statistic for each row into units that are more readily interpretable, accompanied by confidence bounds with a multiple testing correction.

We are looking for further perturbation of observed values beyond what is accounted for by a linear model and, further, beyond what is expected based on the observation weights (assumed to be calibrated and so interpreted as 1/variance). We are seeking to estimate the standard deviation of this further perturbation.

The weitrix must have been calibrated for results to make sense.

Top confident effect sizes are found using the topconfects method, based on the model that the observed weighted sum of squared residuals being non-central chi-square distributed.

Note that all calculations are based on weighted residuals, with a rescaling to place results on the original scale. When a row has highly variable weights, this is an approximation that is only sensible if the weights are unrelated to the values themselves.

Examples


# weitrix_sd_confects should only be used with a calibrated weitrix
calwei <- weitrix_calibrate_all(simwei, ~1, ~1)

weitrix_sd_confects(calwei, ~1)
#> $table
#>   confect effect row_mean   typical_obs_err dispersion n_present df fdr_zero
#> 1 NA      1.8901  2.715e+00 1.416           2.7825     5         4  0.1760  
#> 2 NA      0.5925 -3.310e+00 1.416           1.1751     5         4  0.7083  
#> 3 NA      0.4948 -2.700e+00 1.416           1.1222     4         3  0.7083  
#> 4 NA      0.0000  2.748e+00 1.416           0.9720     4         3  0.7083  
#> 5 NA      0.0000 -9.565e-02 1.416           0.3228     5         4  0.9544  
#> 6 NA      0.0000  9.153e-05 1.416           0.2445     5         4  0.9544  
#> 7 NA      0.0000 -8.430e-02 1.416           0.1686     5         4  0.9544  
#>   name
#> 1 3   
#> 2 5   
#> 3 1   
#> 4 7   
#> 5 4   
#> 6 2   
#> 7 6   
#> 0 of 7 non-zero excess standard deviation at FDR 0.05