# MolForge bioRxiv v0.2 — Discussion + Conclusion (Full Draft)

## 4. Discussion

### 4.1 Multi-axis filter — hit-rate improvement hypothesis validated in silico

The CACHE Challenge #1 benchmark (LRRK2 WDR domain, JCIM 2024) reports a 3.7% average wet-lab hit rate across 23 teams. This sets a lower bound for kinase virtual screening. Our 7-axis Tier 1 GOLD filter applied to consensus_top_20 (n=73) yields 4 candidates passing all 7 axes (5.5% pass rate). Under naive 1:1 mapping, these 4 candidates have substantially elevated in silico evidence compared to CACHE-baseline.

However, naive mapping ignores OOD generalization gap (Murcko scaffold split inflation of 0.10-0.20, arXiv 2406.00873). Our calibrated realistic hit-rate range is 15-35% (central 22%) — derived from external Pearson R = 0.21 (PLINDER scaffold-split) × internal R = 0.5 → 0.42x calibration heuristic, now replaced by formal Mondrian-style Split Conformal (coverage 0.827).

Critical insight: **GNINA CNN vs AEV-PLIG orthogonality** (Pearson |ρ|<0.2 in 3/4 targets, Section 3.3) provides a genuine independent physics signal. Most deep-learning consensus methods (Boltz-2 + Chai-1 + Protenix) share architecture family and exhibit correlated failure modes. The GNINA + AEV orthogonal pairing breaks this correlation and is a primary novelty contribution.

### 4.2 Multi-task OOD limitation — honest disclosure

Section 2.9 multi-task self-selectivity model achieves Pearson R 0.55-0.89 across 5 targets internally, but applied to Saturn pool 1000 candidates the median selectivity_diff is -0.28. This indicates Saturn-generated chemotypes are out-of-distribution from the ChEMBL active set used to train the selectivity model.

Two interpretations:
1. **Model limitation**: training data does not cover Saturn's novel chemotype space → false off-target prediction
2. **True off-target**: Saturn compounds are genuinely promiscuous

We cannot distinguish these without wet-lab data. The honest disclosure: **R2 CRO portfolio must include selectivity panel (≥20 kinases) for at least the top 3 candidates** to anchor this signal externally.

### 4.3 MolFormer-XL — when frozen embedding fails

In Section 2.3.2 we report two negative results:
- **Frozen MolFormer + Morgan FP concat + XGBoost** on small train (487): R = 0.40 (single-target)
- **MolFormer LoRA r=16 multi-task** (5 targets joint training): R = 0.565 (mean)

Both underperform per-target Morgan+RF/XGB baseline (R 0.67) in this dataset size regime. The interpretation: foundation models require dataset volume that exceeds our per-target ChEMBL active counts (n = 800-2300). For n > 2000 (CDK4, TNIK), frozen MolFormer + Morgan concat closes the gap (R 0.666 ≈ baseline 0.671) — consistent with prior literature on MolFormer needing ~10K+ tokens per task for advantage.

**ChEMBL kinome multi-task RF** (Section 2.3.3) achieves R 0.81 mean — substantially higher (+0.14 vs internal baseline 0.671). This validates that **classical feature engineering (Morgan FP) with task-aware RF on larger union data outperforms foundation model frozen embedding** at our scale. Future work: MolFormer fine-tune full backbone (not LoRA) on union 7669 SMILES.

### 4.4 IP novelty — Insilico Rentosertib comparison

The TNIK kinase target has been clinically validated by Insilico Medicine's Rentosertib (INS018_055) Phase IIa positive readout (Nature Medicine 2025-06). This is a unique licensing advantage: the target is no longer pre-clinical only — biomarker, mechanism, and patient population are externally validated.

Our 13 TNIK consensus_top_20 candidates show:
- Mean Tanimoto vs Rentosertib: 0.153 (max 0.238, min 0.108)
- Same Murcko scaffold: 0/13 (complete scaffold independence)

This combination — **clinically validated target + structurally independent chemotype** — is the strongest possible licensing narrative pre-wet-lab. Insilico's wet data establishes the path; our compounds are demonstrably novel-IP (low Tanimoto) and not derivative.

### 4.5 7-axis GOLD filter validation

The 4 Tier 1 candidates passing all 7 axes are:
- **MF-TNIK-6f036c**: consensus 0.969, QSAR pIC50 7.83, AEV pKi 5.69, AiZynth solved, GNINA 5.21, Vina -6.72 kcal/mol, QED 0.85
- **MF-CDK4-99a012**: consensus 0.928, QSAR 6.99, AEV 5.24, AiZynth solved, GNINA 5.36, Vina -7.02, QED 0.88
- **MF-TNIK-576faa**: consensus 0.901, QSAR 7.79, AEV 5.33, AiZynth solved, GNINA 5.19, Vina -6.87, QED 0.85
- **MF-TNIK-066333**: consensus 0.879, QSAR 7.19, AEV 5.50, AiZynth solved, GNINA 5.46, Vina -7.48, QED 0.90

All four have Vina dock < -6 kcal/mol (corresponding to KD < 50 μM in standard mapping), QED > 0.85 (top-decile drug-likeness), and full retrosynthesis route to commercial building blocks. Three of four are TNIK candidates, suggesting TNIK Saturn pool quality is strongest.

### 4.6 Limitations — honest disclosure

1. **In silico only**: All 9-axis evidence is computational. R1 wet-lab anchor (CRO biochemical KD, 2026-04-25 dispatch) is pending (expected 5/15 – 6/15). All hit-rate calibration is provisional until wet anchor arrives.

2. **TYK2 ABFE broken** (memory hit-rate findings #4): TYK2 RBFE 5 versions × 0 normal ΔΔG output. Boltz-2 cofold pose produces NaN-divergent free energy. Vina + GNINA fill this gap but ABFE-grade absolute free energy is not available for TYK2. Workaround: X-ray PDB 4GIH + UniDock + OpenFE 1.7 pipeline (in progress, protein prep stage).

3. **EGFR AEV-PLIG pending**: EGFR holo PDB 4HJO acquired but AEV-PLIG retraining requires cofold output (Boltz EGFR cycle 1-2 weeks). Current EGFR ranking uses qed_only_pending_aev label.

4. **PoseBusters mode**: We applied mol-mode only (chemical validity, 99.5% Saturn pass). Full dock-mode (pose-protein clash, ligand-bond plausibility) requires cofold PDB and was not run pool-wide.

5. **Saturn pool selectivity OOD**: Multi-task model gives unreliable selectivity for Saturn-novel chemotypes (median diff -0.28). Mitigation: CRO selectivity panel for R2 top 3.

6. **Vina docking limitation**: 73 consensus_top_20 docked successfully (mean -7.13 kcal/mol). Saturn pool 1000 Vina was score_only without docking (computational cost). Full dock would take ~10 days GPU.

### 4.7 Next-round CRO portfolio

Budget constraint: KRW 100M / 5 candidates × KRW 20M per CRO biochemical assay.

**Recommended q=5 (target diversity + 7-axis evidence)**:

1. **MF-TNIK-6f036c** (TNIK, 7-axis GOLD Tier 1, Insilico-validated target)
2. **MF-CDK4-99a012** (CDK4, 7-axis GOLD Tier 1, Vina -7.02)
3. **MF-EGFR-saturn-36ff3a** (EGFR, 6-axis Saturn GOLD CNN top 5.76, FDA-precedent target)
4. **MF-CDK6-saturn-c54835** (CDK6, 6-axis Saturn GOLD)
5. **TYK2 placeholder** (Tier 1/2 both fail 7-axis pass; defer to R1 wet anchor)

Final lock decision: **after R1 wet results arrive** for retraining + recalibration.

## 5. Conclusion

We present MolForge — a production-scale AI drug discovery platform deployed 24/7 on commodity GPU hardware (RTX 5090) generating ~977,000 in silico candidates across 5 kinase targets with 95% novelty (Tanimoto < 0.4 vs ChEMBL active). The platform integrates Saturn Mamba SSM (first production deployment of this 2026 Nature MI architecture), Boltz-2 N=9 ensemble (ensemble σ = 0.279 pIC50, 3.8× paper precision), AEV-PLIG v3a (Spearman 0.89-0.94), GNINA CNN (3/4 targets demonstrated orthogonal to AEV), and a novel 7-axis multi-criteria GOLD filter.

Applied to consensus_top_20 (n=73), the 7-axis filter identifies 4 candidates passing all stringent thresholds. One — **MF-TNIK-6f036c** — represents the strongest single in silico hit per available metrics. Its TNIK target was clinically validated (Insilico Rentosertib Phase IIa, Nature Medicine 2025-06), while structural independence (Tanimoto 0.153, scaffold 0/13 match) preserves IP novelty.

Key architectural contributions:
1. **First production Saturn Mamba SSM deployment** (5 targets × 24/7)
2. **GNINA-AEV orthogonality empirically demonstrated** (3/4 targets |ρ|<0.2)
3. **ROBOGATE failure-boundary heatmap** (KIPO patent 10-2026-0057732, 18-month confidentiality)
4. **ChEMBL kinome multi-task QSAR** R 0.81 mean (+0.14 vs internal baseline)

R1 CRO biochemical assay results (6 candidates dispatched 2026-04-25) are pending in the 5/15 – 6/15 window. These data will anchor Mondrian Conformal Prediction recalibration and select the final R2 portfolio (q=10 candidates, KRW 200M budget).

Beyond the empirical results, we contribute an **honest disclosure framework**: every internal R, every coverage metric, every novelty count is reported with explicit limitation context (OOD risk, Vina partial coverage, EGFR AEV pending, MolFormer dataset size dependency). We believe this transparency is essential for AI-driven drug discovery to mature beyond cherry-picked benchmark claims.

## 6. Data and code availability

- **Live evidence**: https://www.molforgeai.com (R2 candidates, Insilico comparison, pipeline pages)
- **JSON evidence**: https://www.molforgeai.com/data/_phase_*.json (15+ files)
- **Source repository**: github.com/liveplex-cpu/molforge-web (commits `83705899` through `d4a24d67`)
- **Immutable git tag**: `r2-candidates-locked-2026-05-22-final`
- **All figures**: docs/figures/ (5 PNG)
- **Molecular structures**: docs/decks/ (PPT 9-slide) + public/data/svgs/ (6 SVG)

## 7. Author contributions

- Heonjeong Cho (Co-CEO, conceptualization, scientific direction)
- Jewoo Yom (CTO, computational infrastructure, model development)
- AgentAI Co., Ltd. (Seoul, 117-86-03600)

## 8. Funding

- AgentAI Co., Ltd. internal funding (founder capital)
- KRW 80M committed to R1 CRO (KIST + Eurofins, 2026-04-25)
- KRW 100M committed to R2 CRO (R1 results pending)

## 9. Conflict of interest

Authors are co-founders and equity holders in AgentAI Co., Ltd. MF-TNIK-6f036c is subject to provisional patent filings (KIPO 10-2026-0057732 series).

## 10. Acknowledgments

- Saturn (Nature MI 2026) authors for open-source release
- Boltz-2 (Wohlwend et al.) for cofold model
- AEV-PLIG (Warren et al., Nature Comm Chem 2025) for affinity GNN
- GNINA (McNutt et al., JCIM 2026) for physics-aware CNN
- Insilico Medicine for publishing Rentosertib trial data
- NVIDIA CUDA team for sm_120 / RTX 5090 support
- AWS Korea (ap-northeast-2) for compute hosting
