Autoformalization: Comparative Overview
Side-by-side comparison of autoformalization papers across agents, tools, datasets, and evaluation methods.
Input & Output
| Paper | Input | Output |
|---|---|---|
| APOLLO | Formal Statement | Formal Proof |
| Minimal_Agent | Formal Statement | Formal Proof |
| Agentic Proof Automation | Formal Statement | Formal Proof |
| Aristotle | Formal Statement | Formal Proof |
| ATF | NLP Statement | Formal Statement |
| Textbook Formalizer | NLP Document | Formal Statement + Formal Proof |
| Ax-Prover | Formal Statement | Formal Proof |
| CriticLean | NLP Statement | Formal Statement |
| Designing Agentic Tool (proposal) | NLP Statement | Formal Statement + Formal Proof |
| HILBERT | Formal Statement | Formal Proof |
| Lifelong Learner | Formal Statement | Formal Proof |
| APRIL | Formal Proof | Formal Proof |
| M2F | NLP Document | Formal Statement + Formal Proof |
| MathCode | NLP Statement | Formal Statement + Formal Proof |
| MerLean | NLP Document | Formal Statement + Formal Proof |
| Numina-Lean-Agent | Formal Statement | Formal Proof |
| Process-Driven Autoformalization | NLP Statement + NLP Proof | Formal Statement + Formal Proof |
| ProofBridge | NLP Statement + NLP Proof | Formal Statement + Formal Proof |
| Seed-Prover | Formal Statement | Formal Proof |
| Delta Prover | Formal Statement + NLP Statement | Formal Proof |
| TheoremLlama | Formal Statement + NLP Statement + NLP Proof | Formal Proof |
Agents Used
| Paper | Formal Prover | Nlp Prover | Formalize Statement | NLP=FL Semantic Check | Break to Lemma | Orchestrator | Formalizer (proof+statement) | Other |
|---|---|---|---|---|---|---|---|---|
| APOLLO | ✓ | · | · | · | · | · | · | Proof Assembler |
| Minimal_Agent | ✓ | · | · | · | · | · | · | Memory Manager |
| Agentic Proof Automation | ✓ | · | · | · | · | · | · | |
| Aristotle | · | ✓ | ✓ | · | · | ✓ | · | Geometry Solver Proof Search Algorithm |
| ATF | · | · | · | ✓ | · | · | ✓ | |
| Textbook Formalizer | ✓ | · | ✓ | ✓ | · | · | · | Engineering Reviewer Maintainer Progress Scanner Triage |
| Ax-Prover | ✓ | · | · | · | · | ✓ | · | |
| CriticLean | · | · | ✓ | ✓ | · | · | · | |
| Designing Agentic Tool (proposal) | · | ✓ | · | · | · | ✓ | ✓ | Nlp Proof Reviewer Code Generation |
| HILBERT | ✓ | ✓ | · | · | ✓ | · | · | Proof Assembler Subgoal Extractor |
| APRIL | ✓ | · | · | · | · | · | · | |
| M2F | ✓ | ✓ | ✓ | · | ✓ | · | · | Statement Extractor |
| MathCode | ✓ | ✓ | ✓ | · | · | · | · | Planner |
| MerLean | · | · | · | ✓ | · | · | ✓ | Statement Extractor Autoinformalizer |
| Numina-Lean-Agent | ✓ | ✓ | · | · | ✓ | ✓ | · | Nlp Proof Reviewer |
| Process-Driven Autoformalization | ✓ | · | · | · | · | · | · | Rank Formalizations |
| ProofBridge | ✓ | · | · | ✓ | · | · | · | |
| Seed-Prover | ✓ | ✓ | · | · | ✓ | · | · | |
| Delta Prover | ✓ | ✓ | · | · | ✓ | · | · | |
| TheoremLlama | ✓ | · | · | · | · | · | · | Autoinformalizer |
Tools & MCPs Available
| Paper | Lean 4 MCP | Lean Search | File System | Code Execution | Other |
|---|---|---|---|---|---|
| APOLLO | ✓ | · | · | · | Auto Solver Sorrifier Syntax Refiner |
| Minimal_Agent | ✓ | ✓ | · | · | |
| Agentic Proof Automation | ✓ | · | ✓ | · | |
| Aristotle | ✓ | · | · | · | Yuclid |
| ATF | ✓ | · | · | · | |
| Textbook Formalizer | ✓ | ✓ | ✓ | · | Bash Tool Git Tools Issue Tracker |
| Ax-Prover | ✓ | ✓ | ✓ | · | |
| CriticLean | ✓ | · | · | · | |
| Designing Agentic Tool (proposal) | ✓ | · | · | ✓ | Web Search |
| HILBERT | ✓ | ✓ | · | · | |
| APRIL | ✓ | · | · | · | |
| M2F | ✓ | · | · | · | |
| MathCode | ✓ | ✓ | ✓ | · | Anti Trivialization Guard |
| MerLean | ✓ | ✓ | ✓ | · | |
| Numina-Lean-Agent | ✓ | ✓ | · | · | |
| Process-Driven Autoformalization | ✓ | · | · | · | |
| ProofBridge | ✓ | ✓ | · | · | |
| Seed-Prover | ✓ | ✓ | · | ✓ | |
| Delta Prover | ✓ | ✓ | · | · | Custom Lean4 Dsl |
| TheoremLlama | ✓ | ✓ | · | · |
Datasets Used
| Paper | MiniF2F | PutnamBench | FATE | Textbook | Research Papers | Combibench | Imo | Other |
|---|---|---|---|---|---|---|---|---|
| APOLLO | ✓ | ✓ | · | · | · | · | · | Proofnet |
| Minimal_Agent | ✓ | ✓ | ✓ | · | · | · | · | LeanCat |
| Aristotle | · | · | · | ✓ | · | · | ✓ | Ag 30 |
| ATF | · | · | · | · | · | ✓ | · | Formalmath Lite Proverbench |
| Textbook Formalizer | · | · | · | ✓ | · | · | · | |
| CriticLean | · | · | · | · | · | · | · | Criticleanbench Omni Math |
| HILBERT | ✓ | ✓ | · | · | · | · | · | |
| APRIL | · | · | · | · | · | · | · | APRIL |
| M2F | · | · | ✓ | ✓ | ✓ | · | · | |
| MerLean | · | · | · | · | ✓ | · | · | |
| Process-Driven Autoformalization | · | · | · | · | · | · | · | Mathlib Theorems MATH GSM8K |
| ProofBridge | ✓ | · | · | · | · | · | · | |
| Seed-Prover | · | ✓ | ✓ | · | · | ✓ | ✓ | Erdős |
| Delta Prover | ✓ | · | · | · | · | · | · | |
| TheoremLlama | ✓ | · | · | · | · | · | · |
Statement Formalization Evaluation
| Paper | Evaluation Method |
|---|---|
| ATF | For the human evaluation, three domain experts independently review 100 randomly sampled instances from each benchmark, with the majority opinion serving as the final performance judgment. |
| Textbook Formalizer | Post-run, an interactive blueprint website enables human community verification side-by-side. No quantitative metric for statement accuracy is reported. |
| CriticLean | Two-step gated verification: Lean 4 compiler checks syntactic validity, then CriticLeanGPT performs deep semantic check. Human evaluation on 50 Omni-MATH problems showed the critic feedback loop boosted accuracy from 38.0% (single pass) to 84.0%. |
| M2F | Human domain experts compare the generated Lean declarations directly against the original, source-linked text spans. |
| MathCode | Two-layer check: (1) hardcoded anti-trivialization guard rejects statements reduced to True/False, (2) LLM-based semantic evaluator grades fidelity A-D against original NLP problem on 5-point checklist (objects, quantifiers, hypotheses, conclusions, multi-part coverage). Optional strict A+ recheck and double-check mode with cross-model agreement. |
| MerLean | Manual review to ensure all new definitions and axioms are mathematically accurate and rigorously constructed, using Autoinformalizer. |
| Process-Driven Autoformalization | Human Lean 4 experts manually evaluated the outputs. |
| ProofBridge | Prove equivilance in Lean4 restricted tactic set with Gemini-2.5-Pro. |