I studied computer science and math at UConn and did my PhD in systems biology at Harvard. I'm now a postdoc in Walter Fontana's lab at Harvard Medical School. These days I'm interested in molecular simulation, clinical data analysis, and Bayesian statistics.
medRxiv
A common way for clinicians to understand whether a patient's lab result is concerning is to see whether it's extreme with respect to the results of a known-healthy population. We analyze the statistical implications of this way of interpreting lab results, identifying theoretical cases where it fails and highlighting considerations for making interpretation more precise and personal.
PLOS One
Directed evolution — repeatedly mutagenizing and selecting better-performing variants — is a powerful tool to engineer proteins. By analysis and simulation of statistical models of this process, we describe how selection stringency (how harshly one selects at each step) affects success.
Immunity
Flu mutates, evading antibodies specialized against prior subtypes. Even antibodies that broadly neutralize a group of influenza A subtypes can rarely neutralize the other group. This study shows that a vaccine enables cross-group protection in humanized mice by a single amino acid change in a class of precursor antibodies.
Blood Neoplasia
Mutations to the gene TP53 are crucial to the progression of cancer including myeloid neoplasms, but the clinical importance of the kind of mutation in these cancers isn't very well understood. This study compares TP53 mutations across subtypes of myeloid neoplasms using patient data.
Drug Discov. Today
The AUC and AUPR are metrics commonly used to evaluate models that predict the side effects of drugs using their molecular features. However, the baseline AUC and AUPR depend on the statistical properties of the ground truth. We analyze this dependence and ask: to what degree do models actually benefit from molecular fingerprints?
Bioinformatics
Studies have genotyped and measured the gene expression levels of many people. Using this data, one can investigate how genotype influences gene expression, useful for understanding complex traits and diseases. We model gene expression, accounting for interactions among genetic markers. By doing so, we more accurately predict the expression of a large subset of genes.
Forecasting
Thunderstorms can cause many power outages in a short period. Predicting these outages is challenging using models that summarize the weather over the entire course of the storm. Instead, we develop a framework for models to learn the dynamics of thunderstorm-caused outages directly from hourly weather forecasts.