The Science Behind MolForge

Four layers of differentiation that competitors cannot replicate overnight.

Layer 1 — Uncertainty Quantification

Every Prediction Comes With a Confidence Interval

Most AI drug discovery platforms give you a point estimate. hERG: 0.3. But is that 0.3 reliable? It could be 0.1 or 0.7. MolForge applies Conformal Prediction to every ADMET prediction, providing 90% confidence intervals that tell you exactly how much to trust each number.

hERG: 0.30[90% CI: 0.15 – 0.45]HIGH CONFIDENCE
hERG: 0.65[90% CI: 0.20 – 0.90]LOW CONFIDENCE
Conformal PredictionDistribution-FreeCalibrated

Layer 2 — Pareto Failure Boundary Mapping

We Map Where Molecules Go Wrong

Rather than only predicting good molecules, MolForge systematically maps the parameter space where candidates fail. Using ROBOGATE adaptive sampling combined with Multi-Objective Pareto optimization, we generate a Failure Boundary Heatmap for every target — showing exactly which structural modifications lead to cardiac toxicity, mutagenicity, or poor bioavailability.

Top Failure Cause

hERG cardiac toxicity (36.3%)

Pareto Rank 1

626 compounds

Search Space

6,631 variants in 108s

ROBOGATEMulti-Objective ParetoPatent Pending

Layer 3 — Certified Compound Tiers

Not All Predictions Are Equal

MolForge assigns every compound a certification tier based on the level of evidence supporting it. Pharmaceutical R&D teams can immediately understand what level of validation they're working with.

TierLevelCriteria
Tier 1AI ScreenedADMET pass + QSAR pIC50 > 6.0 + QED > 0.5
Tier 2Structure VerifiedTier 1 + Boltz-2 ligand_ptm ≥ 0.85
Tier 3Experimentally ValidatedTier 2 + External CRO IC50 assay confirmed

Current TYK2 pipeline: 5 Tier 2 compounds confirmed (ligand_ptm avg 0.883)

Layer 4 — Continuous Feedback Loop

The Model Gets Smarter With Every Experiment

Every CRO assay result — whether a hit or a miss — feeds back into MolForge's models. Negative data (failed experiments) are treated as first-class assets, continuously refining our Conformal Prediction calibration and Pareto boundary maps. This is the moat that takes time to build.

OpenADMET IntegrationCRO FeedbackNegative Data as Asset

Built on Open Science

We stand on the shoulders of giants — and add what was missing.

Boltz-2MIT License

Structure + affinity prediction. FEP-level accuracy, 1000x faster.

ADMET-AI

41 ADMET properties. TDC Leaderboard #1.

RDKitBSD

Cheminformatics toolkit. Industry standard.

QSAR Ensemble

XGBoost + Random Forest. Pearson R=0.562 on TYK2 scaffold split.

Conformal Prediction

Distribution-free uncertainty quantification.

Llama 3Local

Dossier generation. Zero API cost.

From Target to Dossier in Hours, Not Months

End-to-end automated pipeline with no human intervention required.

1

Target Selection

Auto

2

Seed Collection

~30min

3

Structure Generation

~2h

4

ADMET Filtering

~15min

5

QSAR Scoring

~10min

6

Failure Boundary

~2min

7

Compound Dossier

~30min

VALIDATED RESULTS

Validated on TYK2 — A Phase III Clinical Target

0

Variants Generated

0

ADMET Passed (24.0%)

0

Benchmark Score (R)

0

Pareto Rank 1