Autoformalization: Comparative Overview
Side-by-side comparison of autoformalization papers across agents, tools, datasets, and evaluation methods.
Input & Output
| Paper | Input | Output |
|---|---|---|
| Minimal_Agent | Formal Statement | Formal Proof |
| Agentic Proof Automation | Formal Statement | Formal Proof |
| ATF | NLP Statement | Formal Statement |
| Textbook Formalizer | NLP Document | Formal Statement + Formal Proof |
| Ax-Prover | Formal Statement | Formal Proof |
| Designing Agentic Tool (proposal) | NLP Statement | Formal Statement + Formal Proof |
| Lifelong Learner | Formal Statement | Formal Proof |
| M2F | NLP Document | Formal Statement + Formal Proof |
| MathCode | NLP Statement | Formal Statement + Formal Proof |
| MerLean | NLP Document | Formal Statement + Formal Proof |
| Numina-Lean-Agent | Formal Statement | Formal Proof |
| Process-Driven Autoformalization | NLP Statement + NLP Proof | Formal Statement + Formal Proof |
| ProofBridge | NLP Statement + NLP Proof | Formal Statement + Formal Proof |
Agents Used
| Paper | Orchestrator | Formal Prover | Formalize Statement | Formalizer (proof+statement) | NLP=FL Semantic Check | Break to Lemma | Statement Extractor | Nlp Prover | Nlp Proof Reviewer | Code Generation | Rank Formalizations | Autoinformalizer | Memory Manager | Engineering Reviewer | Maintainer | Progress | Scanner | Triage | Planner |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Minimal_Agent | · | ✓ | · | · | · | · | · | · | · | · | · | · | ✓ | · | · | · | · | · | · |
| Agentic Proof Automation | · | ✓ | · | · | · | · | · | · | · | · | · | · | · | · | · | · | · | · | · |
| ATF | · | · | · | ✓ | ✓ | · | · | · | · | · | · | · | · | · | · | · | · | · | · |
| Textbook Formalizer | · | ✓ | ✓ | · | ✓ | · | · | · | · | · | · | · | · | ✓ | ✓ | ✓ | ✓ | ✓ | · |
| Ax-Prover | ✓ | ✓ | · | · | · | · | · | · | · | · | · | · | · | · | · | · | · | · | · |
| Designing Agentic Tool (proposal) | ✓ | · | · | ✓ | · | · | · | ✓ | ✓ | ✓ | · | · | · | · | · | · | · | · | · |
| M2F | · | ✓ | ✓ | · | · | ✓ | ✓ | ✓ | · | · | · | · | · | · | · | · | · | · | · |
| MathCode | · | ✓ | ✓ | · | · | · | · | ✓ | · | · | · | · | · | · | · | · | · | · | ✓ |
| MerLean | · | · | · | ✓ | ✓ | · | ✓ | · | · | · | · | ✓ | · | · | · | · | · | · | · |
| Numina-Lean-Agent | ✓ | ✓ | · | · | · | ✓ | · | ✓ | ✓ | · | · | · | · | · | · | · | · | · | · |
| Process-Driven Autoformalization | · | ✓ | · | · | · | · | · | · | · | · | ✓ | · | · | · | · | · | · | · | · |
| ProofBridge | · | ✓ | · | · | ✓ | · | · | · | · | · | · | · | · | · | · | · | · | · | · |
Tools & MCPs Available
| Paper | Lean 4 MCP | Theorem Search | Document Search | Code Execution | File System | Blueprint Tool | Bash Tool | Git Tools | Issue Tracker | Anti Trivialization Guard |
|---|---|---|---|---|---|---|---|---|---|---|
| Minimal_Agent | ✓ | ✓ | ✓ | · | · | · | · | · | · | · |
| Agentic Proof Automation | ✓ | · | · | · | ✓ | · | · | · | · | · |
| ATF | ✓ | · | · | · | · | · | · | · | · | · |
| Textbook Formalizer | ✓ | ✓ | · | · | ✓ | · | ✓ | ✓ | ✓ | · |
| Ax-Prover | ✓ | ✓ | · | · | ✓ | · | · | · | · | · |
| Designing Agentic Tool (proposal) | ✓ | · | ✓ | ✓ | · | · | · | · | · | · |
| M2F | ✓ | · | · | · | · | · | · | · | · | · |
| MathCode | ✓ | · | ✓ | · | ✓ | · | · | · | · | ✓ |
| MerLean | ✓ | ✓ | · | · | ✓ | · | · | · | · | · |
| Numina-Lean-Agent | ✓ | ✓ | · | · | · | ✓ | · | · | · | · |
| Process-Driven Autoformalization | ✓ | · | · | · | · | · | · | · | · | · |
| ProofBridge | ✓ | ✓ | · | · | · | · | · | · | · | · |
Datasets Used
| Paper | MiniF2F | PutnamBench | FATE | LeanCat | Mathlib Theorems | MATH | GSM8K | Research Papers | Textbook | Combibench | Formalmath Lite | Proverbench |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Minimal_Agent | ✓ | ✓ | ✓ | ✓ | · | · | · | · | · | · | · | · |
| ATF | · | · | · | · | · | · | · | · | · | ✓ | ✓ | ✓ |
| Textbook Formalizer | · | · | · | · | · | · | · | · | ✓ | · | · | · |
| M2F | · | · | ✓ | · | · | · | · | ✓ | ✓ | · | · | · |
| MerLean | · | · | · | · | · | · | · | ✓ | · | · | · | · |
| Process-Driven Autoformalization | · | · | · | · | ✓ | ✓ | ✓ | · | · | · | · | · |
| ProofBridge | ✓ | · | · | · | · | · | · | · | · | · | · | · |
Statement Formalization Evaluation
| Paper | Evaluation Method |
|---|---|
| ATF | For the human evaluation, three domain experts independently review 100 randomly sampled instances from each benchmark, with the majority opinion serving as the final performance judgment. |
| Textbook Formalizer | Post-run, an interactive blueprint website enables human community verification side-by-side. No quantitative metric for statement accuracy is reported. |
| M2F | Human domain experts compare the generated Lean declarations directly against the original, source-linked text spans. |
| MathCode | Two-layer check: (1) hardcoded anti-trivialization guard rejects statements reduced to True/False, (2) LLM-based semantic evaluator grades fidelity A-D against original NLP problem on 5-point checklist (objects, quantifiers, hypotheses, conclusions, multi-part coverage). Optional strict A+ recheck and double-check mode with cross-model agreement. |
| MerLean | Manual review to ensure all new definitions and axioms are mathematically accurate and rigorously constructed, using Autoinformalizer. |
| Process-Driven Autoformalization | Human Lean 4 experts manually evaluated the outputs. |
| ProofBridge | Prove equivilance in Lean4 restricted tactic set with Gemini-2.5-Pro. |