Recently, machine learning methods have generated much progress and excitement in the fields of protein structure prediction and design. However, most of the techniques to date have used discrete, atomistic representations of protein structure as training data. I am currently exploring the application of continuous representations of protein structure to recent machine learning-based approaches, with the hope that this will enable more accurate predictions.

Proteins in solution exist as thermodynamic ensembles, where molecules adopt different conformations and change over time. Macroscopic properties, like binding affinity, free energy, etc., are related to this distribution, and are in fact functions of the Boltzmann partition function (a distribution normalizing constant) over the set of possible conformations. Although protein structure is often reduced to a single low-energy conformation (e.g. a single crystallographic conformation), modeling protein structure as a probability distribution can be more accurate and informative.

These distributions can be extremely rugged, so stochastic approximation techniques can be inaccurate. I have developed algorithms to approximate functions over large spaces and applied these methods to design and study protein therapeutics. I have used these algorithms to design antibodies against HIV and inhibitors of SARS-Cov-2, and also to investigate the structural biology of protein binding and antibiotic resistance from an ensemble perspective.