langevitour

Introduction to tours

Smoothly move through a sequence of projections of data.

grand tour = uniformly samples projections
guided tour = progressively maximize a specified “index” function

Several methods of performing a grand tour were described by Daniel Asimov in 1985: completely random, or a random walk, or a non-random sequence that eventually gets as close as you like to any projection.

Software:

XGobi - interactive data visualization in X-windows
(Deborah Swayne, Di Cook, Andreas Buja)
GGobi - GTK port
tourr - R package for tour calculation and animation
More R packages: liminal detourr

Often combined with interactivity such as linked plots and brushing.

langevitour

install.packages("langevitour")

library(langevitour)

langevitour(dataset, labels)

langevitour is a Javascript app that can be used from R as an htmlwidget.

langevitour is different to other tour software. It’s more like a physics simulation.

It has a range of behaviours, with the grand tour and guided tour as extremes.

Langevin dynamics

Langevin dynamics is a physics model with:

Small random forces.
A velocity damping force.
Position dependent forces, specified by a potential energy function.

Example:
A damped harmonic oscillator excited by small random forces.

Langevin dynamics

System state:

\(x\) position vector
\(v\) velocity vector

For our tour visualization, the “position” \(x\) will represent an orthonormal projection matrix.

In steady state \(x \sim X\) and \(v \sim V\) are independent random variables:

\(X\) has a distribution with density proportional to \(e^{-U(x)}\).
\(V\) has a standard multivariate normal distribution \(\mathcal{N}(0,I)\)

Time evolution of the system maintains this joint distribution of \(X\) and \(V\) at each time point.

(Approximately in the discrete-time version.)

(Omitted for simplicity: temperature parameter.)

Constrained Langevin dynamics

“Position Based Dynamics”

Constraints are maintained by projecting the position back onto an allowed value, and then setting the velocity to match the adjusted position.

\[ x_i = \text{project}( x_{\text{proposed}\ i} ) \]

\[ v_i = \frac{x_i - x_{i-1}}{t} \]

Here \(\text{project}(x)\) fixes \(x\) to represent an orthonormal projection matrix.

Compute the Singular Value Decomposition and set all singular values to 1 (using svd-js).

\[ \mathbf{U S V^\top} = \mathbf{X}_\text{proposed} \\ \mathbf{X} = \mathbf{U V^\top} \]

Mouse brain cells example

2,816 cells, gene expression measured by scRNA-Seq.

log2(norm_count+1) transformed, top 10 principal components found.

Let’s look at it with langevitour.

Momentum is important

Momentum-based methods are not just visually appealing, they are extremely efficient.

Momentum quickly traverses valleys to get near the optimum amazingly fast.
Momentum means random paths reach distances \(\propto t\), rather than the \(\propto \sqrt t\) of Brownian motion.

Related methods

Hamiltonian Monte-Carlo (eg Stan):

Randomizes the velocity at distinct points rather rather than continuously injecting new randomness.
Metropolis-Hastings accept/reject step removes discrete time approximation error.

Stochastic Gradient Descent with momentum (eg Adam):

Implicitly inject randomness by using mini-batch gradients rather than the full dataset gradients.
This is effectively Langevin dynamics, and can be tuned to perform Bayesian posterior sampling.

Extra slides

Langevin dynamics numerical simulation

\[ \begin{align*} x_i & \ \ \text{Position} \\ v_i & \ \ \text{Velocity} \\ \Delta t & \ \ \text{Time step} \\ \gamma & \ \ \text{Damping amount} \\ U(x) & \ \ \text{Energy function} \\ \end{align*} \]

Update velocity:

\[ \begin{align*} v_{\text{proposed}\ i} &=& e^{- \Delta t \gamma} & \ v_{i-1} & & \text{Damp previous velocity}\\ & & +\sqrt{1-e^{-2 \Delta t \gamma}} & \ \mathcal{N}(0,I) & & \text{Replace noise lost to damping}\\ & & -\Delta t & \ \nabla U(x_{i-1}) & & \text{Accelerate downhill} \end{align*} \]

Update position:

\[ x_{\text{proposed}\ i} = x_{i-1} + t\ v_{\text{proposed}\ i} \]

Constraints are then applied to the proposed new x.

What can we put in \(U(x)\)?

\(U(x)\) is an energy function, specifying a distribution of projections to explore.
\(U(x)\) can be scaled to explore a smaller or larger area around its minimum.

Seek projections concentrated around particular axes.
Projection pursuit to ~~find the best projection~~ sample good projections.
- Currently implemented: add up a power transformation of distances \(d_{ij}\) between all projected pairs of points \(i,j\).

\[ U(x) = \frac{-1}{n^2} \sum_{i,j} \begin{cases} \frac{(d_{ij}^2+\epsilon)^\lambda-1}{\lambda} & \text{for}\ \lambda \neq 0 \\ & \\ \small{ \log(d_{ij}^2+\epsilon) } & \text{for}\ \lambda = 0 \end{cases} \]

\(\lambda=0\) does t-SNE style local repulsion.
\(\lambda=1\) finds first two principal components.
\(\lambda=2\) looks for a projection with outliers.

\(\epsilon\) is a small constant to avoid infinities for \(\lambda \leq 0\).

(Implementation actually only does a mini-batch per frame.)