autoformalization Autoformalization: Comparative Overview Side-by-side comparison of autoformalization papers across agents, tools, datasets, and evaluation methods. Automatic Textbook Formalization A multi-agent framework utilizing code-capable language models to translate and prove graduate-level mathematical textbooks in the Lean theorem prover. MathCode / AUTOLEAN A terminal-based AI coding assistant with a built-in math formalization engine that converts natural language math problems into Lean 4 formal theorem statements and proofs. APOLLO Automated LLM and Lean Collaboration for Advanced Formal Reasoning A modular framework that coordinates large language models, the Lean compiler, and automated solvers to repair and verify formal mathematical proofs. A Minimal Agent for Automated Theorem Proving An evaluation of a modular agent architecture for automated theorem proving in Lean 4 featuring iterative refinement, context management, and search tools. Agentic Proof Automation A Case Study A case study evaluating the use of large language model agents for mechanizing the semantic type soundness of a formal system in Lean 4. 2510.01346v2.pdf Aristotle is an automated theorem proving system that integrates a Lean proof search algorithm, an informal reasoning pipeline, and a dedicated geometry solver. Autoformalizer with Tool Feedback A framework that improves autoformalization by integrating syntactic compiler feedback and semantic consistency checks during the generation process. Ax-Prover A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics A multi-agent framework utilizing the Model Context Protocol to equip general-purpose LLMs with Lean tools for automated theorem proving. CriticLean Critic-Guided Reinforcement Learning for Mathematical Formalization A reinforcement learning framework that explicitly trains a critic model to evaluate and improve the semantic fidelity of natural language to Lean 4 mathematical formalizations. Designing an Agentic Tool for AI-Assisted Mathematical Proofs A five-layer orchestration architecture for AI-assisted mathematical proof development incorporating computational experimentation, adversarial review, and formal verification. Hilbert Recursively Building Formal Proofs with Informal Reasoning A multi-agent framework that orchestrates general-purpose reasoning models and specialized prover models to recursively decompose and solve formal mathematical theorems. Learning to Repair Lean Proofs from Compiler Feedback A supervised learning approach and dataset for repairing erroneous Lean proofs using compiler feedback and structured natural-language diagnoses. M2F Automated Formalization of Mathematical Literature at Scale A two-stage framework for automated, project-scale formalization of mathematical literature into buildable Lean projects using verifier-certified refinement. MerLean An Agentic Framework for Autoformalization in Quantum Computation A bidirectional framework that extracts mathematical statements from LaTeX, formalizes them into Lean 4 code, and translates the verified code back into natural language. Numina-Lean-Agent An Open and General Agentic Reasoning System for Formal Mathematics A formal mathematical reasoning system that combines a general coding agent with specialized tools via the Model Context Protocol to interact with the Lean theorem prover. Process-Driven Autoformalization in Lean 4 This paper introduces the Process-Driven Autoformalization (PDA) framework, which leverages process-level feedback from the Lean 4 compiler to iteratively improve the autoformalization of natural language mathematics. ProofBridge Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings ProofBridge is a framework that auto-formalizes natural language proofs into Lean 4 using cross-modal retrieval via joint embeddings, retrieval-augmented fine-tuning, and iterative verifier-guided repair. Seed-Prover 1.5 Mastering Undergraduate-Level Theorem Proving via Learning from Experience Documentation of the training methodologies and experimental scaling results of the Seed-Prover 1.5 formal theorem proving model. Solving Formal Math Problems by Decomposition and Iterative Reflection An agent-based framework that orchestrates interaction between a general-purpose LLM and the Lean 4 environment via reflective decomposition and iterative proof repair. Construction-Verification A Benchmark for Applied Mathematics in Lean 4 A Lean 4 evaluation framework and benchmark requiring explicit solution construction prior to logical verification. LeanAgent, Lifelong Learning for Formal Theorem Proving A lifelong learning framework for formal theorem proving that combines curriculum learning and progressive training to maintain and extend performance across multiple Lean repositories. TheoremLlama Transforming General-Purpose LLMs into Lean4 Experts This paper proposes an end-to-end framework and dataset generation pipeline to train large language models for formal mathematical theorem proving in Lean 4. Towards a Common Framework for Autoformalization A unified conceptual framework defining autoformalization across interactive theorem proving, logic programming, planning, and knowledge representation. Reasoning Chain of Images for Intuitively Reasoning This paper proposes a Chain-of-Images (CoI) method for multimodal models to solve reasoning problems by generating a series of images as intermediate representations, using a Symbolic Multimodal Large Language Model (SyMLLM). Dream 7B Diffusion Large Language Models Dream 7B is a 7-billion parameter diffusion language model that refines text in parallel, initialized from an autoregressive model and trained using context-adaptive noise scheduling. Imagine while Reasoning in Space Multimodal Visualization-of-Thought Multimodal Visualization-of-Thought (MVoT) is proposed to enable Multimodal Large Language Models (MLLMs) to generate interleaved verbal and visual reasoning traces for spatial reasoning tasks. LLM Latent Reasoning as Chain of Superposition Train an encoder that summarizes reasoning chunks. Then train a latent reasoning model on the summaries it produces from some CoT data. LLMs are Single-threaded Reasoners, Demystifying the Working Mechanism of Soft Thinking Vanilla Soft Thinking pushes the model to the greedy token sampling internally. They showed that the model usually continues to work only with the most probable next token. To mitigate this issue they suggest adding noise to logits and get better performance. Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer This work investigates the internal mechanics of the Huginn depth-recurrent Transformer on arithmetic tasks to find evidence for latent chain-of-thought reasoning. Latent Sketchpad Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs A framework that equips Multimodal Large Language Models with an internal visual scratchpad to generate visual latents during autoregressive reasoning, which can be translated into interpretable sketches. Multiplex Thinking Reasoning via Token wise Branch and Merge Make soft-thinking a bit random. Then train with GRPO. Reasoning Within the Mind Dynamic Multimodal Interleaving in Latent Space Training-free latent reasoning. Optimize latent reasoning tokens to maximize model confidence, which correlates with accuracy. Reinforcement Learning via Self-Distillation Self-Distillation Policy Optimization (SDPO) leverages a model's in-context learning capabilities to convert tokenized environment feedback into dense, logit-level reinforcement learning signals. Scaling Latent Reasoning via Looped Language Models Training a recurrent reasoning model. Looping the same models over and over again. Soft Tokens, Hard Truths They add Gaussian noise to the soft-thinking embeddings, then train with RL using RLOO. Zebra-CoT A Dataset for Interleaved Vision Language Reasoning Zebra-CoT is a large-scale dataset with 182,384 interleaved text-image reasoning traces across 18 domains for training multimodal models. Hallucination DASH Detection and Assessment of Systematic Hallucinations of VLMs Make a dataset that VLMs hallucinate and wrongly think things exist in images Interpretability A Mathematical Framework for Transformer Circuits In Transformers residual stream is the main object and layers read and write from/to it. An Overview of Early Vision in InceptionV1 inceptionV1 feature maps of different layers CLIP-Dissect Automatic Description of Neuron Representations Find concepts that activates a neuron using a image dataset Deep Learning is Not So Mysterious or Different ?? Physics of Language Models Understanding LLMs by training smaller LMs in controlled environment Scaling Monosemanticity Extracting Interpretable Features from Claude 3 Sonnet Scale SAE to Claude 3 Sonnet Towards Monosemanticity Decomposing Language Models With Dictionary Learning How SAE works Zoom In An Introduction to Circuits Investigate Vision Circuits by Studying the Connections between Neurons Can Large Language Models Explain Their Internal Mechanisms? summary of Can Large Language Models Explain Their Internal Mechanisms? Emergent World Representations Exploring a Sequence Model Trained on a Synthetic Task summary of Emergent World Representations Exploring a Sequence Model Trained on a Synthetic Task Interpretability Beyond Feature Attribution Quantitative Testing with Concept Activation Vectors (TCAV) summary of Interpretability Beyond Feature Attribution Quantitative Testing with Concept Activation Vectors (TCAV) ActiveLearning Every decision tree has an influential variable title is self-explanatory Learning decision trees from random examples Decision tree learning By Finding Consistent Decision Trees Leveraged volume sampling for linear regression Active Learning in linear regression with multiplicative error rate bounds Properly learning decision trees in almost polynomial time learning a decision tree for unifrom random data distribution in O(s ^ log(log(s))) Top-down induction of decision trees- rigorous guarantees and inherent limitations greedily learn a decision tree based on the most inflouential variables in all leaves. Active Learning Survey Active Learning for Agnostic classification The True Sample Complexity of Active Learning A different definition of active learning label complexity Other Summaries Labeling Neural Representations with Inverse Recognition summary of Labeling Neural Representations with Inverse Recognition Progress measures for grokking via mechanistic interpretability summary of Progress measures for grokking via mechanistic interpretability What do we learn from inverting CLIP models? summary of What do we learn from inverting CLIP models?