Multimodal generative models of in vitro cellular perturbations

Generating image-based phenotype responses to cellular perturbations in silico, and vice versa to integrate multiple screening modalities and provide in silico predictions to efficiently guide further wet-lab experiments. "Stable diffusion but for biology," if you like. Work in progress.

Universal cell image embeddings using masked autoencoders and vision transformers

Self-supervised representation learning of cellular morphology / perturbation state across cell types and image modalities to build a foundational vision model capable of universally embedding all in vitro cellular image modalities. Work in progress.

Multiscale modality-agnostic molecular embeddings using chemical language models

Protein, DNA, RNA, and small molecule language models have strong in-domain performance but lack generality: e.g., protein language models don't understand chemical modification / non-natural amino acids, but peptides with such properties are crucial therapeutic targets. Using self-supervision and hierarchical / stochastic tokenization schemes, we train a truly general language model on the entirety of biologically-relevant molecule space, for universally applicable embeddings and generative models. Work in progress.

Rescuing recombinant protein expression in mammalian cells using protein language models

Coupling inverse folding sequence generators with an in-house LLM-based expression prediction model, we rescue non- and low-expressing proteins of interest by proposing variant sequences that reliably express at 10-100x the level of the parent sequence. Work in progress.

Pharmacokinetic property prediction from molecular structure

Accurately predicting half-life and clearance rate for monomer and multimer peptide / small protein therapeutics from historical data to reduce the need for pre-clinical animal trials. We use multitask Gaussian processes to leverage and factor all high dimension data relationships of interest, and make uncertainty-quantified predictions for novel molecular entities. Work in progress.

A Python library for high throughput cellular image data analysis

Foundational image / multimodal models are data hungry; to efficiently feed them with new data it's imperative to make data QC, collation, processing, analysis, and modeling as modular, performant, and painless as possible. A Python library with templated HPC workflows, a modular CLI, pre-prepared analysis notebooks, and import /export image QC enables rapid, reproducible, and verifiable microscope → knowledge distillation, as well as efficient data curation for training large foundational models. Work in progress, to be open-sourced.

A deep generative model of 3D single-cell organization

We introduce a framework for end-to-end integrative modeling of 3D single-cell multi-channel fluorescent image data of diverse subcellular structures. We employ stacked conditional β-variational autoencoders to first learn a latent representation of cell morphology, and then learn a latent representation of subcellular structure localization which is conditioned on the learned cell morphology. Paper link

Open and reproducible data-centric workflows

A simple project template with preconfigured integration hooks to make it easy to do things the right way. If you work primarily in Python and are interested in versioning code & data, this may be of interest to you.

Cell states beyond transcriptomics

Quantitative co-analysis of RNA abundance and sarcomere organization in single cells and an integrated framework to predict subcellular organization states from gene expression. Paper link

About

I live on Vashon Island. I work at the Novo Nordisk , designing efficient learning algorithms for extracting knowledge from high dimensional experimental data, and integrating machine learning approaches with mechanistic biophysical models to create multiscale models of cellular behavior.

Before that, I was a a Sr. Modeling Scientist at the Allen Insitute for Cell Science and a postdoc at the Institute for Systems Biology with the Hood-Price Lab .

I went grad school at CMU & Pitt, where I worked with Dan Zuckerman,
and collaborated with Jim Faeder, Chris Langmead, Markus Dittrich, Bob Murphy, and Takis Benos .