# MolForge Methods (bioRxiv v0.2 — Full Draft)

## 2. Methods

### 2.1 Compound generation pipeline

#### 2.1.1 Saturn Mamba SSM (primary generator)
- Saturn (Guo et al., Nature MI 2026, arXiv 2405.17066) — Mamba state-space backbone with reinforcement learning oracle
- Configuration: bucket_size=10, budget=1000, augmentation_rounds=15, beam_enumeration=disabled (paper-aligned, RL crash 회피)
- Reference rotation: 11 unique SMILES per target rotated per cycle for diversity
- Deploy: 5 targets (TYK2/TNIK/CDK4/CDK6/EGFR) × 24/7 production on RTX 5090 GPU (sm_120)
- PyTorch 2.12 nightly cu128 + mamba_ssm 2.3.1 cu12torch2.10 wheels (custom build)
- First global production deployment (Saturn paper +6 months)

#### 2.1.2 REINVENT4 (secondary)
- AstraZeneca REINVENT 4 (Olivecrona et al., J Cheminf 2026)
- Configuration: sigma=128, bucket_size=25 (paper default 복원, 2026-04-28)
- 5타겟 transfer learning from ChEMBL active sets

### 2.2 Structure-based scoring stack

#### 2.2.1 Boltz-2 cofold ensemble
- Boltz-2 (Wohlwend et al., bioRxiv 2025) with `--diffusion_samples 10 --no_kernels`
- 9 random seeds × 10 candidates = 90 affinity predictions for σ estimation
- Measured ensemble std σ = 0.279 (affinity_pred_value scale), SEM = 0.093
- Paper benchmark σ_single = 1.5 kcal/mol = 1.05 pIC50 — our ensemble achieves 3.8× precision

#### 2.2.2 Chai-1 + Protenix co-folding
- Chai-1 v0.7 (Chai Discovery, 2026-02 release)
- Protenix v2.1 (ByteDance, 2026-04 release)
- Consensus iptm averaged with Boltz-2 for 3-way validation

#### 2.2.3 AEV-PLIG v3a
- AEV-PLIG (Warren et al., Nature Comm Chem 2025) — GATv2 GNN with 10-ensemble (`range(10)` model average)
- TYK2 in paper benchmark — directly validated
- 9 spearman 0.89-0.94 (4 targets: TYK2/TNIK/CDK4/CDK6)
- EGFR holo PDB pending (4HJO acquired 2026-05-22)
- v3a formula: 0.3·v2 + 0.7·aev (self-correlation caveat disclosed)

#### 2.2.4 GNINA CNN (physics-aware orthogonal signal)
- GNINA 1.3 (McNutt et al., JCIM 2026) docker `gnina/gnina:latest`
- CPU mode (sm_120 GPU mode libtorch incompatibility — disclosed)
- 5타겟 × consensus_top_20 73 = 365 dockings + 1000 Saturn pool
- **Orthogonality**: GNINA CNN vs AEV-PLIG Pearson per target:
  - TYK2 0.181 / TNIK -0.152 / CDK4 -0.119 / CDK6 0.435
  - 3/4 targets |ρ|<0.2 = TRUE orthogonal physics signal

#### 2.2.5 AutoDock Vina 1.2.7
- pip `vina` 1.2.7 Python binding
- 73 consensus_top_20 dock + score: mean -7.13 kcal/mol, 84% binders < -6 kcal/mol
- Saturn 1000 score_only without docking (limitation disclosed)

### 2.3 QSAR models

#### 2.3.1 Per-target XGBoost+RF ensemble
- Morgan FP radius=2, 2048 bit
- Murcko scaffold split, 5-fold CV
- Per-target Pearson R: TYK2 0.779 / TNIK 0.539 / CDK4 0.512 / CDK6 0.695 / EGFR 0.830
- Mean R = 0.671 (baseline)

#### 2.3.2 MolFormer-XL fine-tuning experiments
- IBM MoLFormer-XL-both-10pct (1.1B parameters)
- Frozen + Morgan concat + XGBoost: R 0.666 (≈ baseline, small dataset OOD)
- LoRA fine-tune (r=16, α=32): R 0.565 (multi-task interference, disclosed)
- Conclusion: per-target single-task baseline superior in this dataset size regime

#### 2.3.3 ChEMBL kinome multi-task (new in v0.2)
- 5타겟 × ChEMBL active SMILES (8166 union from API dump 2026-05-22)
- RF n=300 multi-target per-target heads
- **Result**: TYK2 R=**0.845** / CDK4 R=**0.825** / CDK6 R=0.716 / EGFR R=**0.858** / TNIK n=9 (sparse)
- Mean R 0.81 = **+0.14 vs internal QSAR baseline** ⭐

### 2.4 ADMET + safety

#### 2.4.1 ADMET-AI (Swanson et al., MIT 2024)
- 104 properties prediction
- Cutoffs (literature-aligned): hERG < 0.7 (asymmetric loss, Pollard 2010 / Crumb 2016)
- AMES < 0.5 (Bayes-optimal)
- HIA > 0.5, MW < 500, LogP < 5 (Lipinski 4/4)

#### 2.4.2 Drug-likeness
- RDKit QED (Bickerton 2012) ≥ 0.4 (drug-like floor)

#### 2.4.3 PAINS
- RDKit FilterCatalogs A+B+C (Baell-Holloway 2010 full)
- PoseBusters mol mode (Buttenschoen et al., Chem Sci 2024)
- **Result**: 73/73 consensus pass + 995/1000 Saturn pool pass (99.5%) ⭐

### 2.5 Uncertainty quantification

#### 2.5.1 Mondrian Conformal Prediction
- MAPIE-compatible Split Conformal (Romano et al. 2019)
- 5타겟 mean empirical coverage = 0.827 @ α=0.1
- Mean interval width 2.555 pIC50

#### 2.5.2 Boltz-2 N=9 ensemble σ
- Direct seed-variance measurement
- σ_paper 1.5 kcal/mol → our σ 0.279 (3.8× precision)
- Discriminability: 1 log unit pIC50 confidently distinguishable

### 2.6 Multi-axis GOLD filter

#### Filter axes (7-axis Tier 1):
1. Consensus 6-way score ≥ 0.85
2. QSAR pIC50 ≥ 6.5 (KD ≤ ~316 nM predicted)
3. AEV-PLIG pKi ≥ 5.0
4. AiZynthFinder full retrosynth route solved (USPTO + ZINC stock)
5. GNINA CNN affinity ≥ p70 (top 30%)
6. Vina dock real ≤ -6 kcal/mol
7. RDKit QED ≥ 0.4 + Lipinski (MW<500 + LogP<5)

**Result on consensus_top_20 73**: 4 candidates pass all 7 axes (5.5% pass rate)

### 2.7 Synthesizability

#### 2.7.1 AiZynthFinder v4.4.1
- USPTO policy + ZINC stock (Genheden et al. 2020)
- 1000 Saturn pool candidates: 462/1000 solved (46.2%)
- 73 consensus_top_20: 25/73 solved (34.2%)
- Per-target solved rate: TNIK 77% > CDK4 45% > CDK6 20% > TYK2 10% > EGFR 0%

### 2.8 IP novelty analysis

#### 2.8.1 Tanimoto vs ChEMBL active
- Morgan FP r=2, 2048 bit
- RDKit PostgreSQL cartridge v0.73.0 (AWS production)
- 8166 ChEMBL 5-target active SMILES staging
- Saturn pool 749K scored: 95.7% Tanimoto < 0.4 (novel)

#### 2.8.2 Murcko scaffold + Generic CSK
- Bemis-Murcko (Bemis 1996)
- Generic CSK (heteroatom replacement + bond unification)
- 73 consensus: 40 Butina cluster (Tc=0.4), 31 Generic CSK = chemotype 다양성 확보

#### 2.8.3 Insilico Rentosertib comparison
- Rentosertib (PubChem CID 164938183, Insilico INS018_055)
- Phase IIa Idiopathic Pulmonary Fibrosis positive readout (Nature Medicine 2025-06)
- 13 MolForge TNIK consensus_top_20 vs Rentosertib:
  - **Mean Tanimoto 0.153** (max 0.238, min 0.108)
  - **Same Murcko scaffold: 0/13**
- **Conclusion**: validated target + IP-independent chemotype = licensing dual-advantage

### 2.9 Cross-target selectivity

#### 2.9.1 5-target self multi-task RF
- Union 7669 SMILES from clustered seeds
- Per-target Pearson R: TYK2 0.599 / TNIK 0.554 / CDK4 0.588 / CDK6 0.613 / EGFR 0.891
- 73 consensus selectivity diff (on-target ≥ off-target +1.0 log): 20/73 selective

#### 2.9.2 Limitation
- Saturn pool 1000 selectivity median -0.28 = OOD from ChEMBL training set
- Honest disclosure: wet-lab selectivity anchor required

### 2.10 Hardware + reproducibility

#### 2.10.1 Compute
- GPU: NVIDIA RTX 5090 32GB (sm_120) on WSL Ubuntu 24.04
- AWS EC2 (ap-northeast-2 Seoul, t3.xlarge equivalent)
- 24/7 production daemon (Apr 13 2026~, 4.5 weeks uptime by submission)
- Power cap 460W (80% safe), thermal threshold 80°C with auto-kill watchdog

#### 2.10.2 Software stack
- conda env "molforge" (PyTorch 2.5.1+cu121, transformers, RDKit 2026.3.1, scikit-learn)
- conda env "saturn" (PyTorch 2.12 nightly cu128, mamba_ssm 2.3.1)
- conda env "openfe_real" (OpenFE 1.7.0, gufe, openmm)
- Reproducibility: git tag `baseline-2026-05-21` immutable + 22+ subsequent commits
- All code + data: https://www.molforgeai.com/data

#### 2.10.3 Database
- PostgreSQL 16 + RDKit cartridge v0.73.0 (extension)
- 977K compounds with morgan_fp + ADMET + novelty (cumulative 2026-04-15 ~ 2026-05-22)
- Automated backup: nightly pg_dump + 14-day rotation + GFS

### 2.11 Failure-boundary mapping (ROBOGATE)

- KIPO 10-2026-0057732 (patent, 18-month confidentiality until 2027-09)
- 5타겟 × failure axes (hERG / AMES / HIA / MW / LogP / PAINS) heatmap
- Adaptive sampling based on failure-boundary feedback
- Unique to MolForge (no competitor disclosed similar IP)