Reproducibility Kit
Verify Our Numbers. Run It Yourself.
Open-sourced model weights, training code, evaluation scripts, and a 10K sample compound set. Built on the principle that scientific claims must be independently reproducible. Big Tech does not do this.
Why Open This
Trust requires verification
In drug discovery, an unverifiable benchmark number is worth less than nothing — it costs partners time and erodes trust. We expose our model so anyone can run it.
What Stays Proprietary
Target-fine-tuned weights + ROBOGATE
Pre-train weights are open. Target-specific fine-tuned weights (TYK2, CDK4, CDK6, TNIK, EGFR) stay private — this is the result of compute investment + curated data. ROBOGATE adaptive sampling is patent-protected.
License
Apache 2.0 + CC-BY-NC
Code: Apache 2.0 (commercial use OK). Sample compound dataset: CC-BY-NC 4.0 (academic use; commercial use requires partner agreement).
Kit Components
5 Deliverables
MolForge-Gen v1 (small)
Model weights
Reduced 25.4M-parameter MolForge-Gen weights (Pre-train only, no target fine-tune). Sufficient for general molecular generation reproduction. Fine-tuned target-specific weights remain proprietary.
→ PyTorch state_dict (.pt) + config.json + tokenizer vocab
Apache 2.0
~95 MB
Training & Evaluation Code
Source code
Full training pipeline (tokenizer, model, training loop, evaluation scripts). Excludes ROBOGATE adaptive sampling (KIPO 10-2026-0057732 protected method).
→ GitHub repo: liveplex-cpu/molforge-gen-public
Apache 2.0
~2 MB
Sample Compound Set (10K anonymized)
Dataset
10,000 anonymized compound IDs with full property predictions (pIC50, hERG, AMES, QED, novelty). SMILES omitted — properties only. Sufficient for benchmarking against your own pipeline.
→ CSV + JSON + property statistics report
CC-BY-NC 4.0
~3 MB CSV
Benchmark Suite Runner
Source code
Scripts to reproduce our TDC ADMET, GuacaMol, and TYK2 prospective screening benchmark numbers using our model. Verifies the External Benchmarks table on our /methodology page.
→ Python scripts + Docker compose for environment
Apache 2.0
~500 KB
Conformal Prediction Module
Source code
Standalone CP calibration module — drop-in for any sklearn / PyTorch regressor. The exact module that produces our 86.5% empirical coverage on TYK2.
→ pip-installable: pip install molforge-conformal
Apache 2.0
~200 KB
Release Schedule
Phased Rollout
2026 Q2
GitHub repo + tokenizer + sample compound dataset
In Progress2026 Q3
Pre-train weights (small) + training code + benchmark runner
Planned2026 Q4
Conformal Prediction pip package + Docker image
Planned2027 Q1
Public TDC leaderboard submission with reproducible scripts
PlannedGet Notified on Release
Be First When the Repo Goes Public
Email contact@molforgeai.com with subject "repro-kit" to get an early-access notification when each component is released.
Request Access →