Principal components of a weitrix — weitrix

Finds principal components of a weitrix. If varimax rotation is enabled, these are then rotated to enhance interpretability.

weitrix_components(
  weitrix,
  p = 0,
  design = ~1,
  n_restarts = 3,
  max_iter = 1000,
  tol = 1e-05,
  use_varimax = TRUE,
  initial = NULL,
  verbose = TRUE
)

weitrix_components_seq(
  weitrix,
  p,
  design = ~1,
  n_restarts = 3,
  max_iter = 1000,
  tol = 1e-05,
  use_varimax = TRUE,
  verbose = TRUE
)

Arguments

weitrix	A weitrix object, or an object that can be converted to a weitrix with `as_weitrix`.
p	Number of components to find.
design	A formula referring to `colData(weitrix)` or a matrix, giving predictors of a linear model for the experimental design. By default only an intercept term is used, i.e. rows are centered before finding components. A more complex formula might be used to account for batch effects. `~0` can be used if rows are already centered.
n_restarts	Number of restarts of the iteration to use.
max_iter	Maximum iterations.
tol	Stop iterating if R-squared increased by less than this amount in an iteration.
use_varimax	Use varimax rotation to enhance interpretability of components.
initial	Optional, an initial guess for column components (scores). Can have fewer columns than `p`, in which remaining components are initialized randomly. Can have more columns than `p`, in which case a randomly chosen subspace is used in each restart.
verbose	Show messages about the progress of the iterations.

Value

A "Components" object with the following elements accessible using $.

row Row matrix, aka loadings. Rows are rows in the weitrix, and columns contain the experimental design (usually just an intercept term), and components.
col Column matrix, aka scores. Rows are columns in the weitrix, and columns contain fitted coefficients for the experimental design, and components.
R2 Weighted R squared statistic. The proportion of total variance explained by the components.
all_R2s R2 statistics from all restarts. This can be used to check how consistently the iteration finds optimal components.
ind_designColumn indices associated with experimental design.
ind_componentsColumn indices associated with components.

For a result comp, the original measurements are approximated by comp$row %*% t(comp$col).

weitrix_components_seq returns a list of Components objects, with increasing numbers of components from 1 up to p.

Details

Note that this is a slow numerical method to solve a gnarly problem, for the case where weights are not uniform. The case of uniform weights or weights that can be written as an outer product of row and column weights is somewhat faster, however there are much faster algorithms for this available elsewhere.

An iterative method is used, starting from a random initial set of components. It is possible for this to get stuck at a local minimum. To ameliorate this, the iteration is initially run n_restarts times and the best result used. This is then iterated further. Examine all_R2s in the output to see if this is happening -- if the values are not all nearly identital, the iteration is sometimes getting stuck at local minima. Increase n_restarts to increase the odds of finding the global minimum.

Functions

weitrix_components: Find a matrix decomposition with the specified number of components.
weitrix_components_seq: Produce a sequence of weitrix decompositions with 1 to p components.

Examples

# Variables in rows, observations in columns, as per Bioconductor convention
dat <- t(iris[,1:4])

# Find two components
comp <- weitrix_components(dat, p=2, max_iter=5, n_restart=1)
#> Iter    1 R^2=0.93938 0.3sec
#> Iter    2 R^2=0.97273 0.3sec
#> Iter    3 R^2=0.97711 0.3sec
#> Iter    4 R^2=0.97762 0.3sec
#> Iter    5 R^2=0.97768 0.3sec

# Examine row and col matrices
pairs(comp$row, panel=function(x,y) text(x,y,rownames(comp$row)))
pairs(comp$col)