Autoformalization: Comparative Overview

Paper	Input	Output
APOLLO	Formal Statement	Formal Proof
Minimal_Agent	Formal Statement	Formal Proof
Agentic Proof Automation	Formal Statement	Formal Proof
Aristotle	Formal Statement	Formal Proof
ATF	NLP Statement	Formal Statement
Textbook Formalizer	NLP Document	Formal Statement + Formal Proof
Ax-Prover	Formal Statement	Formal Proof
CriticLean	NLP Statement	Formal Statement
Designing Agentic Tool (proposal)	NLP Statement	Formal Statement + Formal Proof
HILBERT	Formal Statement	Formal Proof
Lifelong Learner	Formal Statement	Formal Proof
APRIL	Formal Proof	Formal Proof
M2F	NLP Document	Formal Statement + Formal Proof
MathCode	NLP Statement	Formal Statement + Formal Proof
MerLean	NLP Document	Formal Statement + Formal Proof
Numina-Lean-Agent	Formal Statement	Formal Proof
Process-Driven Autoformalization	NLP Statement + NLP Proof	Formal Statement + Formal Proof
ProofBridge	NLP Statement + NLP Proof	Formal Statement + Formal Proof
Seed-Prover	Formal Statement	Formal Proof
Delta Prover	Formal Statement + NLP Statement	Formal Proof
TheoremLlama	Formal Statement + NLP Statement + NLP Proof	Formal Proof

Paper	Formal Prover	Nlp Prover	Formalize Statement	NLP=FL Semantic Check	Break to Lemma	Orchestrator	Formalizer (proof+statement)	Other
APOLLO	✓	·	·	·	·	·	·	Proof Assembler
Minimal_Agent	✓	·	·	·	·	·	·	Memory Manager
Agentic Proof Automation	✓	·	·	·	·	·	·
Aristotle	·	✓	✓	·	·	✓	·	Geometry Solver Proof Search Algorithm
ATF	·	·	·	✓	·	·	✓
Textbook Formalizer	✓	·	✓	✓	·	·	·	Engineering Reviewer Maintainer Progress Scanner Triage
Ax-Prover	✓	·	·	·	·	✓	·
CriticLean	·	·	✓	✓	·	·	·
Designing Agentic Tool (proposal)	·	✓	·	·	·	✓	✓	Nlp Proof Reviewer Code Generation
HILBERT	✓	✓	·	·	✓	·	·	Proof Assembler Subgoal Extractor
APRIL	✓	·	·	·	·	·	·
M2F	✓	✓	✓	·	✓	·	·	Statement Extractor
MathCode	✓	✓	✓	·	·	·	·	Planner
MerLean	·	·	·	✓	·	·	✓	Statement Extractor Autoinformalizer
Numina-Lean-Agent	✓	✓	·	·	✓	✓	·	Nlp Proof Reviewer
Process-Driven Autoformalization	✓	·	·	·	·	·	·	Rank Formalizations
ProofBridge	✓	·	·	✓	·	·	·
Seed-Prover	✓	✓	·	·	✓	·	·
Delta Prover	✓	✓	·	·	✓	·	·
TheoremLlama	✓	·	·	·	·	·	·	Autoinformalizer

Paper	Lean 4 MCP	Lean Search	File System	Code Execution	Other
APOLLO	✓	·	·	·	Auto Solver Sorrifier Syntax Refiner
Minimal_Agent	✓	✓	·	·
Agentic Proof Automation	✓	·	✓	·
Aristotle	✓	·	·	·	Yuclid
ATF	✓	·	·	·
Textbook Formalizer	✓	✓	✓	·	Bash Tool Git Tools Issue Tracker
Ax-Prover	✓	✓	✓	·
CriticLean	✓	·	·	·
Designing Agentic Tool (proposal)	✓	·	·	✓	Web Search
HILBERT	✓	✓	·	·
APRIL	✓	·	·	·
M2F	✓	·	·	·
MathCode	✓	✓	✓	·	Anti Trivialization Guard
MerLean	✓	✓	✓	·
Numina-Lean-Agent	✓	✓	·	·
Process-Driven Autoformalization	✓	·	·	·
ProofBridge	✓	✓	·	·
Seed-Prover	✓	✓	·	✓
Delta Prover	✓	✓	·	·	Custom Lean4 Dsl
TheoremLlama	✓	✓	·	·

Paper	MiniF2F	PutnamBench	FATE	Textbook	Research Papers	Combibench	Imo	Other
APOLLO	✓	✓	·	·	·	·	·	Proofnet
Minimal_Agent	✓	✓	✓	·	·	·	·	LeanCat
Aristotle	·	·	·	✓	·	·	✓	Ag 30
ATF	·	·	·	·	·	✓	·	Formalmath Lite Proverbench
Textbook Formalizer	·	·	·	✓	·	·	·
CriticLean	·	·	·	·	·	·	·	Criticleanbench Omni Math
HILBERT	✓	✓	·	·	·	·	·
APRIL	·	·	·	·	·	·	·	APRIL
M2F	·	·	✓	✓	✓	·	·
MerLean	·	·	·	·	✓	·	·
Process-Driven Autoformalization	·	·	·	·	·	·	·	Mathlib Theorems MATH GSM8K
ProofBridge	✓	·	·	·	·	·	·
Seed-Prover	·	✓	✓	·	·	✓	✓	Erdős
Delta Prover	✓	·	·	·	·	·	·
TheoremLlama	✓	·	·	·	·	·	·

Paper	Evaluation Method
ATF	For the human evaluation, three domain experts independently review 100 randomly sampled instances from each benchmark, with the majority opinion serving as the final performance judgment.
Textbook Formalizer	Post-run, an interactive blueprint website enables human community verification side-by-side. No quantitative metric for statement accuracy is reported.
CriticLean	Two-step gated verification: Lean 4 compiler checks syntactic validity, then CriticLeanGPT performs deep semantic check. Human evaluation on 50 Omni-MATH problems showed the critic feedback loop boosted accuracy from 38.0% (single pass) to 84.0%.
M2F	Human domain experts compare the generated Lean declarations directly against the original, source-linked text spans.
MathCode	Two-layer check: (1) hardcoded anti-trivialization guard rejects statements reduced to True/False, (2) LLM-based semantic evaluator grades fidelity A-D against original NLP problem on 5-point checklist (objects, quantifiers, hypotheses, conclusions, multi-part coverage). Optional strict A+ recheck and double-check mode with cross-model agreement.
MerLean	Manual review to ensure all new definitions and axioms are mathematically accurate and rigorously constructed, using Autoinformalizer.
Process-Driven Autoformalization	Human Lean 4 experts manually evaluated the outputs.
ProofBridge	Prove equivilance in Lean4 restricted tactic set with Gemini-2.5-Pro.