Chemistry & Drug Discovery
AlphaFold-lineage protein structure prediction, molecular design, drug discovery. Specialized scientific models.
Setup walkthrough
- Chemistry AI spans two domains: (a) LLM reasoning about chemical concepts, (b) specialized ML for molecular property prediction and drug discovery.
- For LLM-based chemistry reasoning:
ollama pull deepseek-r1:32b(~20 GB). Prompt: "Predict the product of this reaction: CH3COOH + C2H5OH --[H2SO4]--> ? Show the mechanism step by step." - For molecular property prediction:
pip install deepchem(DeepChem — open-source library for drug discovery ML). Pre-trained models for solubility, toxicity, binding affinity prediction. - For protein-ligand docking:
pip install diffdock(DiffDock — diffusion-based molecular docking). Feed a protein structure (PDB file) + ligand SMILES → predicts binding pose. First result in 1-5 minutes on GPU. - For AlphaFold-class protein structure prediction:
pip install colabfold(ColabFold, local AlphaFold2). Feed an amino acid sequence → predicted 3D structure. A 300-residue protein predicts in 30-60 minutes on 12 GB GPU. - For molecular generation: SMILES-based LLMs or diffusion models generate novel molecules with desired properties.
The cheap setup
Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs ColabFold (protein structure prediction) for proteins up to ~500 residues in 30-90 minutes. DiffDock for molecular docking in 1-3 minutes per ligand. DeepChem molecular property models train in minutes. For small-molecule drug discovery workflows: 12 GB handles most single-protein targets. Pair with Ryzen 5 5600 + 32 GB DDR4 + 1TB NVMe (protein databases are large). Total: ~$400-480. $400 gets you into computational chemistry research — the same workflows that required university clusters 5 years ago.
The serious setup
Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs ColabFold for large proteins (1000+ residues) in 60-120 minutes. DiffDock batched docking (1,000 ligands) in 2-4 hours. For virtual screening campaigns (dock 1M compounds against a protein target): RTX 3090 handles ~10K ligands/hour with DiffDock. For Alphafold3 (cloud-only currently) replacement workflows, the GPU handles the compute when open implementations arrive. Total: ~$1,800-2,200. Dual RTX 3090 for parallel docking or training custom molecular models. Chemistry ML is compute-hungry for large-scale screening.
Common beginner mistake
The mistake: Running an LLM for chemical calculations ("What's the molecular weight of aspirin?") without verifying the answer, then using it in a lab notebook. Why it fails: LLMs hallucinate molecular formulas. Asked for "aspirin molecular weight," the model might correctly say C9H8O4 (180.16 g/mol) — or it might give you the formula for ibuprofen (C13H18O2) or a completely fictional compound. The model doesn't compute molecular weight from atoms — it recalls a training example, which may be wrong. The fix: Use computational chemistry tools for calculations. RDKit (pip install rdkit) computes molecular weight, logP, rotatable bonds deterministically from SMILES: from rdkit import Chem; mol = Chem.MolFromSmiles("CC(=O)OC1=CC=CC=C1C(=O)O"); print(Chem.Descriptors.MolWt(mol)). LLMs for conceptual reasoning ("explain SN2 reaction mechanism"); RDKit/DeepChem for calculations. Deterministic tools for numbers, LLMs for concepts.
Recommended setup for chemistry & drug discovery
Browse all tools for runtimes that fit this workload.
Reality check
Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.
Common mistakes
- Buying for spec-sheet VRAM without modeling KV cache + activation overhead
- Underestimating quantization quality loss below Q4
- Skipping flash-attention support (real perf gap on long context)
- Ignoring sustained-load thermals (laptops thermal-throttle within 30 min)
What breaks first
The errors most operators hit when running chemistry & drug discovery locally. Each links to a diagnose+fix walkthrough.
Before you buy
Verify your specific hardware can handle chemistry & drug discovery before committing money.