AI for Drug Discovery

Research Focus

Energy-Based Modeling

The AIMNet2 framework provides quantum-chemical accuracy for drug-like molecules, ions, and reactive intermediates—essential for accurate property prediction in pharmaceutical contexts. The AIMNet2-NSE extension handles open-shell systems like quinones and covalent inhibitor intermediates, compounds frequently encountered in medicinal chemistry but challenging for traditional computational methods.

Free Energy Simulations

Hybrid ML/molecular mechanics approaches significantly improve binding affinity predictions. Our methods reduce absolute binding free energy errors from 0.97 to 0.47 kcal/mol—a critical improvement for rank-ordering compounds in lead optimization campaigns. Active learning-guided lead optimization has achieved 20-fold efficiency gains compared to brute-force screening approaches.

Reaction Awareness

The AIMNet2-rxn framework evaluates millions of reaction pathways, supporting synthesis-aware discovery by assessing:

Synthetic accessibility and retrosynthetic planning
Strain effects in proposed molecules
Metabolic stability predictions
Covalent modification mechanisms

Active Learning Integration

Our models identify high-uncertainty regions to guide experimental validation selectively, aligning computational predictions with real-world resource constraints in drug development. This approach dramatically reduces the number of compounds requiring synthesis and testing.

Property Prediction

We develop machine learning models to predict key pharmaceutical properties including:

ADMET properties: Absorption, Distribution, Metabolism, Excretion, and Toxicity
Binding affinity: Predicting how strongly a molecule binds to its target protein
Selectivity: Ensuring drugs bind to the intended target and not off-targets
Synthetic accessibility: Estimating how difficult a molecule is to synthesize
pKa prediction: Protein ionization states affecting binding and solubility

Notable Achievements

CACHE Challenge Success

In the CACHE Challenge (Critical Assessment of Computational Hit-finding Experiments) targeting the LRRK2 WD40 domain for Parkinson’s disease, our team achieved tied first-place with an 8.5% experimental hit rate—demonstrating the practical impact of our methods on real pharmaceutical targets.

Validated Computational Workflows

Our closed-loop workflows coupling predictive models with experimental validation have successfully identified novel inhibitors across multiple therapeutic targets, with compounds advancing to experimental testing at partner organizations.

Software Tools

Publicly available tools from our drug discovery research:

Auto3D: Automatic 3D structure generation from SMILES notation
pKa-ANI: Protein pKa prediction achieving mean absolute error under 0.5 pKa units
AIMNet2: Neural network potential for accurate energy and property predictions

Collaborative Network

We work closely with:

CMU Drug Discovery Platform
UPMC Hillman Cancer Center
Pharmaceutical industry partners (GSK, Pfizer, Genentech, and others)
Academic collaborators worldwide

Impact

Our methods enable:

Screening billions of molecules for drug-like properties
Identifying promising candidates for experimental validation
Reducing time for lead optimization from months to weeks
Exploration of previously inaccessible chemical space
Integration with automated synthesis platforms

Recent Highlights

CACHE Challenge: Tied first place with 8.5% experimental hit rate for LRRK2 WD40 domain (Parkinson’s disease target)
Binding affinity improvements: Reduced absolute free energy errors from 0.97 to 0.47 kcal/mol
20-fold efficiency gain: Active learning-guided optimization vs. brute-force screening
Industry validation: Compounds advancing to experimental testing at partner organizations

Our drug discovery platform integrates multiple computational approaches:

Neural Network Potentials - Quantum-accurate energy predictions (AIMNet2) for drug-like molecules and reactive intermediates
Generative AI - De novo molecular design with multi-objective optimization for ADMET properties
Reactions & Reactivity - Prediction of metabolic stability, covalent mechanisms, and synthetic accessibility
Experiment Automation - Closed-loop workflows coupling computational predictions with robotic synthesis and testing
Quantum Chemistry - High-accuracy benchmarks for model validation and challenging systems

Open Science

All our models and key datasets are released open-source to accelerate research globally. Visit our Software page for available tools and documentation.

Funding

National Institutes of Health

Grant:R01GM140467

Machine Learning for Drug Design and Optimization

2022-2027