palmerpenguins
install.packages("langevitour")
library(langevitour)
langevitour(dataset, labels)

Use directly from R.
   or
R Markdown → static HTML.

Interaction, calculations, and drawing done in Javascript.




Seek a “good” projection. (Cook et al., 1995)


Spin randomly. (Asimov, 1985)

Touring data

GGobi in action

Animate linear, orthonormal projections of data.

Previous software does this by choosing distinct projections then interpolating.

langevitour’s novel feature is using Langevin Dynamics to produce a smooth random path of projections.


  • Grand Tour introduced by Daniel Asimov in 1985.

  • XGobi, GGobi - interactive desktop applications.

  • tourr - R package for tour calculation and animation.

  • Ongoing interdisciplinary research by Di Cook and collaborators, e.g. slice tours.

Langevin Dynamics

“Stochastic Differential Equation”

Constrained Langevin Dynamics

The “position” should always represent an orthonormal projection.


langevitour uses “Position Based Dynamics”, a simple and stable way to maintain constraints in a physics engine.


Each timestep:

  1. Update velocity \(\mathbf{v} \leftarrow \mathbf{v} - \mathrm{damping} + \mathrm{noise} + \mathrm{forces}\)
  2. Propose new position \(\mathbf{x} \leftarrow \mathbf{x} + \Delta t\ \mathbf{v}\)
  3. Fix the proposed \(\mathbf{x}\) by finding the nearest valid orthonormal projection. (SVD and set singular values to 1)
  4. Fix \(\mathbf{v}\) to be consistent with the actual step taken by \(\mathbf{x}\).


Details: https://www.biorxiv.org/content/10.1101/2022.08.24.505207v1

Guided Tour


langevitour can sample projections near an optimum defined by a potential energy function.

For potential energy \(U(\mathbf{x})\) and temperature \(T\) it samples with density:

\[ \rho(\mathbf{x}) \propto e^{-U(\mathbf{x}) / T} \]


The force applied is \(-\nabla U(\mathbf{x})\).


langevitour has energy functions based on distances between points, as if there is a repulsion force between points.


Stochastic Gradients: To save computation, only a mini-batch of the gradient is computed per frame.
Using a Stochastic Gradient just adds extra noise to Langevin Dynamics.

Application to single cell RNA sequencing


Kang et al. (2018)

Peripheral Blood Mononuclear Cells (PBMC) from eight donors with lupus.

Cells were stimulated with a cytokine, recombinant IFN-β.

    U = Unstimulated
    S = Stimulated

UMAP hints at the geometry.

Standard Seurat processing steps:

  1. Normalize and log1p transform counts.
  2. Find Principal Components.
  3. UMAP layout of cells.

A nice feature of this dataset is that doublets can be identified confidently from genetic differences.

langevitour visualization

langevitour(
    pbmc$vm@cell.embeddings,
    pbmc$labels)

Cell scores for
10 principal components,
varimax rotated for interpretability. (gist)

Showing 10,000 cells.



Denoising makes the geometry clearer.

knnDenoise(cell.embeddings)

Each cell
→ 30 nearest neighbors
→ 30 nearest neighbors
→ average of set.


captures the response to the cytokine in most cells. brings most cell types into alignment. Monocytes are doing something extra with .

Mouse over the doublet labels. Ideally they should still lie between cell types.

Relating components to UMAP

langevitour(
    cbind(umap, cell.embeddings), 
    labels)


Mouse over component labels.

Components are interpretable due to varimax rotation. (gist)


Small motions help understand the UMAP.

Gene loadings

We’ve looked at cell scores for each component.

Each component also has gene loadings.

For example, we can pull out genes involved in , or , or search for .

Conclusion

Seeing what a processing step does:
harmony removes differences between samples, allowing clustering by cell type.

Use langevitour to get an intuitive sense of a dataset, then actively explore it.

Small random motions are common in the natural world. Our eyes are good at interpreting them.


See what scRNA-Seq processing steps are doing.

Understand what is driving a UMAP layout.

Use directly from R/RStudio, or share as static HTML from rmarkdown.


Acknowledgements

The start of the idea for langevitour came out of discussions with Prof. Di Cook and Dr. Stuart Lee.

Further examples

Human retina

Menon et al. (2019)

Data

Points shown are denoised.

Mouse spermatogenesis

Lukassen, et al. (2018)

Data

Development of sperm follows a singular but not straight trajectory.

Points shown are denoised.

Extra slides

Momentum is important

Momentum-based methods are not just visually appealing, they are extremely efficient.

  • Momentum in optimization quickly traverses valleys.

  • Momentum means random paths reach distances \(\propto t\), rather than the \(\propto \sqrt t\) of Brownian motion.


Related methods

Hamiltonian Monte-Carlo (e.g. Stan):

  • Randomizes the velocity at distinct points rather rather than continuously injecting new randomness.
  • Metropolis-Hastings accept/reject step removes discrete time approximation error.


Stochastic Gradient Descent with momentum (e.g. Adam):