AI for Drug Discovery

Energy-driven, decision-oriented computational methods integrating machine learning for pharmaceutical applications

AI for Drug Discovery

Research Focus

Energy-Based Modeling

The AIMNet2 framework provides quantum-chemical accuracy for drug-like molecules, ions, and reactive intermediates—essential for accurate property prediction in pharmaceutical contexts. The AIMNet2-NSE extension handles open-shell systems like quinones and covalent inhibitor intermediates, compounds frequently encountered in medicinal chemistry but challenging for traditional computational methods.

Free Energy Simulations

Hybrid ML/molecular mechanics approaches significantly improve binding affinity predictions. Our methods reduce absolute binding free energy errors from 0.97 to 0.47 kcal/mol—a critical improvement for rank-ordering compounds in lead optimization campaigns. Active learning-guided lead optimization has achieved 20-fold efficiency gains compared to brute-force screening approaches.

Reaction Awareness

The AIMNet2-rxn framework evaluates millions of reaction pathways, supporting synthesis-aware discovery by assessing:

  • Synthetic accessibility and retrosynthetic planning
  • Strain effects in proposed molecules
  • Metabolic stability predictions
  • Covalent modification mechanisms

Active Learning Integration

Our models identify high-uncertainty regions to guide experimental validation selectively, aligning computational predictions with real-world resource constraints in drug development. This approach dramatically reduces the number of compounds requiring synthesis and testing.

Property Prediction

We develop machine learning models to predict key pharmaceutical properties including:

  • ADMET properties: Absorption, Distribution, Metabolism, Excretion, and Toxicity
  • Binding affinity: Predicting how strongly a molecule binds to its target protein
  • Selectivity: Ensuring drugs bind to the intended target and not off-targets
  • Synthetic accessibility: Estimating how difficult a molecule is to synthesize
  • pKa prediction: Protein ionization states affecting binding and solubility

Notable Achievements

CACHE Challenge Success

In the CACHE Challenge (Critical Assessment of Computational Hit-finding Experiments) targeting the LRRK2 WD40 domain for Parkinson’s disease, our team achieved tied first-place with an 8.5% experimental hit rate—demonstrating the practical impact of our methods on real pharmaceutical targets.

Validated Computational Workflows

Our closed-loop workflows coupling predictive models with experimental validation have successfully identified novel inhibitors across multiple therapeutic targets, with compounds advancing to experimental testing at partner organizations.

Software Tools

Publicly available tools from our drug discovery research:

  • Auto3D: Automatic 3D structure generation from SMILES notation
  • pKa-ANI: Protein pKa prediction achieving mean absolute error under 0.5 pKa units
  • AIMNet2: Neural network potential for accurate energy and property predictions

Collaborative Network

We work closely with:

  • CMU Drug Discovery Platform
  • UPMC Hillman Cancer Center
  • Pharmaceutical industry partners (GSK, Pfizer, Genentech, and others)
  • Academic collaborators worldwide

Impact

Our methods enable:

  • Screening billions of molecules for drug-like properties
  • Identifying promising candidates for experimental validation
  • Reducing time for lead optimization from months to weeks
  • Exploration of previously inaccessible chemical space
  • Integration with automated synthesis platforms

Open Science

All our models and key datasets are released open-source to accelerate research globally. Visit our Software page for available tools and documentation.

Funding

National Institutes of Health

Grant:R01GM140467

Machine Learning for Drug Design and Optimization

2022-2027