Introduction to tours

Smoothly move through a sequence of projections of data.

  • grand tour = uniformly samples projections
  • guided tour = progressively maximize a specified “index” function
Several methods of performing a grand tour were described by Daniel Asimov in 1985: completely random, or a random walk, or a non-random sequence that eventually gets as close as you like to any projection.


Software:

  • XGobi - interactive data visualization in X-windows
                  (Deborah Swayne, Di Cook, Andreas Buja)

  • GGobi - GTK port

  • tourr - R package for tour calculation and animation

  • More R packages: liminal detourr

Often combined with interactivity such as linked plots and brushing.

langevitour

install.packages("langevitour")

library(langevitour)

langevitour(dataset, labels)

langevitour is a Javascript app that can be used from R as an htmlwidget.

langevitour is different to other tour software. It’s more like a physics simulation.

It has a range of behaviours, with the grand tour and guided tour as extremes.

Langevin dynamics

Langevin dynamics is a physics model with:

  • Small random forces.
  • A velocity damping force.
  • Position dependent forces, specified by a potential energy function.


Example:
A damped harmonic oscillator excited by small random forces.

Langevin dynamics

System state:

  • \(x\) position vector
  • \(v\) velocity vector

For our tour visualization, the “position” \(x\) will represent an orthonormal projection matrix.


In steady state \(x \sim X\) and \(v \sim V\) are independent random variables:

  • \(X\) has a distribution with density proportional to \(e^{-U(x)}\).
  • \(V\) has a standard multivariate normal distribution \(\mathcal{N}(0,I)\)

Time evolution of the system maintains this joint distribution of \(X\) and \(V\) at each time point.

(Approximately in the discrete-time version.)

(Omitted for simplicity: temperature parameter.)

Constrained Langevin dynamics

“Position Based Dynamics”

Constraints are maintained by projecting the position back onto an allowed value, and then setting the velocity to match the adjusted position.

\[ x_i = \text{project}( x_{\text{proposed}\ i} ) \]

\[ v_i = \frac{x_i - x_{i-1}}{t} \]

Here \(\text{project}(x)\) fixes \(x\) to represent an orthonormal projection matrix.

  • Compute the Singular Value Decomposition and set all singular values to 1 (using svd-js).

\[ \mathbf{U S V^\top} = \mathbf{X}_\text{proposed} \\ \mathbf{X} = \mathbf{U V^\top} \]

Mouse brain cells example

Momentum is important

Momentum-based methods are not just visually appealing, they are extremely efficient.

  • Momentum quickly traverses valleys to get near the optimum amazingly fast.

  • Momentum means random paths reach distances \(\propto t\), rather than the \(\propto \sqrt t\) of Brownian motion.


Related methods

Hamiltonian Monte-Carlo (eg Stan):

  • Randomizes the velocity at distinct points rather rather than continuously injecting new randomness.
  • Metropolis-Hastings accept/reject step removes discrete time approximation error.


Stochastic Gradient Descent with momentum (eg Adam):

Extra slides

Langevin dynamics numerical simulation

\[ \begin{align*} x_i & \ \ \text{Position} \\ v_i & \ \ \text{Velocity} \\ \Delta t & \ \ \text{Time step} \\ \gamma & \ \ \text{Damping amount} \\ U(x) & \ \ \text{Energy function} \\ \end{align*} \]


Update velocity:

\[ \begin{align*} v_{\text{proposed}\ i} &=& e^{- \Delta t \gamma} & \ v_{i-1} & & \text{Damp previous velocity}\\ & & +\sqrt{1-e^{-2 \Delta t \gamma}} & \ \mathcal{N}(0,I) & & \text{Replace noise lost to damping}\\ & & -\Delta t & \ \nabla U(x_{i-1}) & & \text{Accelerate downhill} \end{align*} \]

Update position:

\[ x_{\text{proposed}\ i} = x_{i-1} + t\ v_{\text{proposed}\ i} \]

Constraints are then applied to the proposed new x.

What can we put in \(U(x)\)?

\(U(x)\) is an energy function, specifying a distribution of projections to explore.
\(U(x)\) can be scaled to explore a smaller or larger area around its minimum.

  • Seek projections concentrated around particular axes.

  • Projection pursuit to find the best projection sample good projections.

    • Currently implemented: add up a power transformation of distances \(d_{ij}\) between all projected pairs of points \(i,j\).

\[ U(x) = \frac{-1}{n^2} \sum_{i,j} \begin{cases} \frac{(d_{ij}^2+\epsilon)^\lambda-1}{\lambda} & \text{for}\ \lambda \neq 0 \\ & \\ \small{ \log(d_{ij}^2+\epsilon) } & \text{for}\ \lambda = 0 \end{cases} \]

\(\lambda=0\) does t-SNE style local repulsion.
\(\lambda=1\) finds first two principal components.
\(\lambda=2\) looks for a projection with outliers.

\(\epsilon\) is a small constant to avoid infinities for \(\lambda \leq 0\).

(Implementation actually only does a mini-batch per frame.)