Reproducibility Kit

Verify Our Numbers. Run It Yourself.

Open-sourced model weights, training code, evaluation scripts, and a 10K sample compound set. Built on the principle that scientific claims must be independently reproducible. Big Tech does not do this.

Why Open This

Trust requires verification

In drug discovery, an unverifiable benchmark number is worth less than nothing — it costs partners time and erodes trust. We expose our model so anyone can run it.

What Stays Proprietary

Target-fine-tuned weights + ROBOGATE

Pre-train weights are open. Target-specific fine-tuned weights (TYK2, CDK4, CDK6, TNIK, EGFR) stay private — this is the result of compute investment + curated data. ROBOGATE adaptive sampling is patent-protected.

License

Apache 2.0 + CC-BY-NC

Code: Apache 2.0 (commercial use OK). Sample compound dataset: CC-BY-NC 4.0 (academic use; commercial use requires partner agreement).

Kit Components

5 Deliverables

MolForge-Gen v1 (small)

Model weights

Reduced 25.4M-parameter MolForge-Gen weights (Pre-train only, no target fine-tune). Sufficient for general molecular generation reproduction. Fine-tuned target-specific weights remain proprietary.

PyTorch state_dict (.pt) + config.json + tokenizer vocab

PREPARING

Apache 2.0

~95 MB

Training & Evaluation Code

Source code

Full training pipeline (tokenizer, model, training loop, evaluation scripts). Excludes ROBOGATE adaptive sampling (KIPO 10-2026-0057732 protected method).

GitHub repo: liveplex-cpu/molforge-gen-public

PREPARING

Apache 2.0

~2 MB

Sample Compound Set (10K anonymized)

Dataset

10,000 anonymized compound IDs with full property predictions (pIC50, hERG, AMES, QED, novelty). SMILES omitted — properties only. Sufficient for benchmarking against your own pipeline.

CSV + JSON + property statistics report

PREPARING

CC-BY-NC 4.0

~3 MB CSV

Benchmark Suite Runner

Source code

Scripts to reproduce our TDC ADMET, GuacaMol, and TYK2 prospective screening benchmark numbers using our model. Verifies the External Benchmarks table on our /methodology page.

Python scripts + Docker compose for environment

PREPARING

Apache 2.0

~500 KB

Conformal Prediction Module

Source code

Standalone CP calibration module — drop-in for any sklearn / PyTorch regressor. The exact module that produces our 86.5% empirical coverage on TYK2.

pip-installable: pip install molforge-conformal

PREPARING

Apache 2.0

~200 KB

Release Schedule

Phased Rollout

2026 Q2

GitHub repo + tokenizer + sample compound dataset

In Progress

2026 Q3

Pre-train weights (small) + training code + benchmark runner

Planned

2026 Q4

Conformal Prediction pip package + Docker image

Planned

2027 Q1

Public TDC leaderboard submission with reproducible scripts

Planned

Get Notified on Release

Be First When the Repo Goes Public

Email contact@molforgeai.com with subject "repro-kit" to get an early-access notification when each component is released.

Request Access →