Nadal-Ribelles, M., Solé, C., Díez-Villanueva, A., Stephan-Otto Attolini, C., Matas, Y., Steinmetz, L., De Nadal, E., & Posas, F. (2025). A single-cell resolved genotype-phenotype map using genome-wide genetic and environmental perturbations. Nature Communications, 16(1), 2645. https://doi.org/10.1038/s41467-025-57600-4
Single cell RNA-Seq dataset.
Many manual gene knock-outs that have been pooled.
Knock-out gene for each cell identifiable from a barcode inserted at the end of the URA3 gene. (URA3 is inserted to allow selection of successful knock-out cells.)
Have data on cells in control and salt-stress conditions.
(Perturb-Seq is often done randomly with CRISPR = CROP-Seq. In this dataset the knockouts were done more manually – it seems like a big project!)
Ornstein–Uhlenbeck Process
\[ \newcommand{\y}{\mathbf{y}} \newcommand{\x}{\mathbf{x}} \newcommand{\b}{\mathbf{b}} \newcommand{\A}{\mathrm{A}} \newcommand{\B}{\mathrm{B}} \newcommand{\C}{\mathrm{C}} \newcommand{\E}{\mathrm{E}} \newcommand{\W}{\mathrm{W}} \newcommand{\L}{\mathrm{L}} \newcommand{\I}{\mathrm{I}} \newcommand{\dy}{\mathrm{d}\y} \newcommand{\dt}{\mathrm{d}t} \newcommand{\dW}{\mathrm{d}\W} \newcommand{\N}{\mathcal{N}} \dy_t = \A\y_t\,\dt + \B\x\,\dt + \C\,\dW_t \]
I’m only considering the steady state behaviour.
I’ll build up from correlation to causation.
😀 About as simple as SDEs get. 😦 All SDEs are weird.
I scale the expression data so all genes have standard deviation 1.
Covariances can be interpreted as correlation.
Becomes an assumption about relative rates of turnover of genes, as a steady state model provides no information about this! (I’m also assuming \(\mathrm{C}=\mathrm{I}\) in the model.)
In the networks that follow, I use soft thresholding to only show 150 links between genes.
The actual model has the 2,000 most highly expressed genes, with potential links between all genes. I fit the model using PyTorch to 96,101 control condition cells and 115,422 salt-stressed cells. Each cell has a knocked-out gene, there are 783 different knock-out genes.
Before any model fitting, we can just look at correlation between genes. This thresholded correlation network has similarities to the network used in WGCNA. We see clusters involving for example histones, ribosome, cell wall, mating, glycolysis.
Inverse correlation has also been used to investigate gene regulatory networks. The inverse correlation network is sparser, and may be easier to interpret. It also corresponds to the \(\A\) matrix in an Ornstein-Uhlenbeck model, although not a very realistic one (\(\A\) is symmetric, which is clearly wrong).
Fitting a model that also uses information from gene knockouts (with some regularization), we start to get a causal network. A condition is included in the model for each knock-out, constrained so the effect is only on the knocked-out gene. \(\A\) is no longer symmetric.
APE3 shows up as causally associated with various “glycolysis and gluconeogenesis” genes, such as FBA1, but this was not seen just from correlation.
Below, dots are cells. The black circles indicate means.
When APE3 is knocked out, expression of FBA1 rises, on average. Therefore expression of APE3 reduces expression of FBA1. We also notice there is not much correlation between these two genes. A model that reconciles these two observations needs to also have FBA1 enhance expression of APE3.
The data included control and salt-stressed cells. There is an indicator variable in \(\x\) and a corresponding column estimated in \(\B\) for the direct effect of salt stress. The direct effect of salt stress appears to be up-regulation of certain genes, with broader downstream effects including down-regulation of many genes (as expected).
(Could also take full effects from a bulk RNA-Seq experiment and infer direct effects…)
Just having big biological datasets may not be enough.
Bulk omics
Measure many variables.
See causal effect of 1s-100s of perturbations.
Single cell or spatial omics
Observe in fine enough detail to see stochastic biological variation (correlation).
Single cell: Perturb-Seq
See causal effect of 1000s of perturbations.
This list is incomplete, you can help by expanding it.