Biology & Genomics

Genomic sequence analysis, protein design, single-cell analysis. ESM, RoseTTAFold, scGPT.

Setup walkthrough

Biology AI spans: protein structure prediction, genomic sequence analysis, single-cell RNA-seq, and literature reasoning.
For protein structure: pip install colabfold (local AlphaFold2). Feed a FASTA sequence → predicted PDB structure. A 400-residue protein: 30-60 minutes on 12 GB GPU.
For protein design: pip install proteinmpnn (ProteinMPNN — inverse folding: given a backbone structure, design an amino acid sequence that folds to it). Runs in seconds on GPU.
For genomic analysis: pip install biopython + pip install esm (ESM-2, Meta's protein language model). ESM-2 computes embeddings for protein sequences used for variant effect prediction, structure prediction, and functional annotation.
For single-cell RNA-seq: pip install scvi-tools (scVI — deep generative model for scRNA-seq). Trains on 100K cells in 30-60 minutes on GPU. Handles batch correction, clustering, differential expression.
For biological literature Q&A: general LLMs handle PubMed-level biology questions. For specialized genomics knowledge, fine-tune or RAG over domain corpora.

The cheap setup

Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs ColabFold for proteins up to 500 residues in 30-90 minutes — enough for most single-domain proteins. ESM-2 embeddings for 100K sequences in ~30 minutes. scVI training on 100K cells in 30-60 minutes. For a molecular biology lab automating routine analysis: $400-500 handles protein structure prediction, variant effect scoring, and single-cell analysis. Pair with Ryzen 5 5600 + 32 GB DDR4 + 1TB NVMe (genomic datasets are large). Total: ~$400-480. Biology AI at $400 replaces cloud compute costs for routine tasks.

The serious setup

Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs ColabFold for large multi-domain proteins (800+ residues) in 60-120 minutes. ESM-2 15B (largest variant) embeddings improve downstream prediction accuracy for variant effects. For genomics-scale analysis: training scVI on 1M+ cells in 2-4 hours. For a computational biology lab: RTX 3090 handles 80% of routine workloads. The remaining 20% (genome-wide association studies, metagenomics assembly) need CPU clusters, not GPU. Total: ~$1,800-2,200. For AlphaFold-class structure prediction at scale: dual RTX 3090 for parallel predictions (predict multiple proteins simultaneously).

Common beginner mistake

The mistake: Running ColabFold on a protein sequence, getting a predicted structure, and treating it as experimentally determined ground truth. Why it fails: AlphaFold/ColabFold predictions are statistical — they predict the most likely structure given the training data. For well-studied protein families: accuracy approaches experiment (~1-2 Å RMSD). For novel proteins, disordered regions, or proteins with rare folds: predictions can be wildly wrong (10+ Å RMSD) with high confidence (high pLDDT scores on wrong structures). The confidence metric (pLDDT) correlates with accuracy on average but is unreliable for individual predictions. The fix: Always check the predicted aligned error (PAE) plot — it shows which regions are reliably positioned relative to each other. Low PAE between domains = high confidence in relative orientation. High PAE = the domains could be anywhere. For publication-quality structures, validate with experimental methods (X-ray crystallography, cryo-EM) or multiple orthogonal predictions (AlphaFold2 + ESMFold + RoseTTAFold — agreement between methods increases confidence). Predicted ≠ determined.

Recommended setup for biology & genomics

Recommended hardware

Best GPU for local AI →

All workloads ranked across VRAM tiers.

Recommended runtimes

Browse all tools for runtimes that fit this workload.

Budget build

AI PC under $1,000 →

Best GPU for this task

Best GPU for local AI →

Reality check

Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.

Common mistakes

Buying for spec-sheet VRAM without modeling KV cache + activation overhead
Underestimating quantization quality loss below Q4
Skipping flash-attention support (real perf gap on long context)
Ignoring sustained-load thermals (laptops thermal-throttle within 30 min)

What breaks first

The errors most operators hit when running biology & genomics locally. Each links to a diagnose+fix walkthrough.

Before you buy

Verify your specific hardware can handle biology & genomics before committing money.

Buyer guides

Compare hardware

Troubleshooting

Specialized buyer guides

Updated 2026 roundup

Best free local AI tools (2026) →