Process-Driven Autoformalization in Lean 4

Dataset

They introduce FormL4. They translate back MathLib 4 theorems/proofs to Natrual language and manually check their correctness. They did not check the correctness of NLP proofs.

How they translate formals to informal?

Use Gemini-Pro-1.5 for translation. Filter out theorems that needs hard lemmas. The process is as follows:

Translate the Statement: Convert the formal theorem statement into a natural-language math problem without mentioning any Lean 4 functions.
Explain the Proof (Reasoning Buffer): Explain the meaning of each line/tactic in the formal proof based on the definitions of the lemmas used. This acts as a “scratchpad” or reasoning buffer for the model to understand the logic.
Construct the Final Proof: Write a fluent, step-by-step mathematical proof in natural language, entirely independent of the formal syntax and without verbatim mentions of Lean 4 code.
Manual check by Lean4 experts.

Training

They trained a model using their dataset.

Inference

The model is given theorem statement and proof in NL. Goal: translate statement and proof to Lean4.

The system takes the prompt and generates multiple candidate formalizations. Each candidate includes both the formal Lean 4 theorem statement and the formal Lean 4 proof tactics using the fine-tuned Mistral-v0.3-7B model.
Then they extract the Lean4 from output
Instead of compiling all of candidates, the system evaluates the generated candidates to predict which one is most likely to compile successfullyusing Process-Supervised Verifier (PSV).
The best candidate according to PSV is send to lean4 compiler; If pass they conculde the autoformalization is correct.

Validation/results

They take 60 random autoformalization samples and manually check the correctness. They find that 25% of the things that pass their system are still wrong.