ActiveLearning Every decision tree has an influential variable title is self-explanatory Learning decision trees from random examples Decision tree learning By Finding Consistent Decision Trees Leveraged volume sampling for linear regression Active Learning in linear regression with multiplicative error rate bounds Properly learning decision trees in almost polynomial time learning a decision tree for unifrom random data distribution in O(s ^ log(log(s))) Top-down induction of decision trees- rigorous guarantees and inherent limitations greedily learn a decision tree based on the most inflouential variables in all leaves. Active Learning Survey Active Learning for Agnostic classification The True Sample Complexity of Active Learning A different definition of active learning label complexity Interpretability A Mathematical Framework for Transformer Circuits In Transformers residual stream is the main object and layers read and write from/to it. An Overview of Early Vision in InceptionV1 inceptionV1 feature maps of different layers CLIP-Dissect Automatic Description of Neuron Representations Find concepts that activates a neuron using a image dataset Scaling Monosemanticity Extracting Interpretable Features from Claude 3 Sonnet Scale SAE to Claude 3 Sonnet Towards Monosemanticity Decomposing Language Models With Dictionary Learning How SAE works Zoom In An Introduction to Circuits Investigate Vision Circuits by Studying the Connections between Neurons Can Large Language Models Explain Their Internal Mechanisms? summary of Can Large Language Models Explain Their Internal Mechanisms? Emergent World Representations Exploring a Sequence Model Trained on a Synthetic Task summary of Emergent World Representations Exploring a Sequence Model Trained on a Synthetic Task Interpretability Beyond Feature Attribution Quantitative Testing with Concept Activation Vectors (TCAV) summary of Interpretability Beyond Feature Attribution Quantitative Testing with Concept Activation Vectors (TCAV) Other Summaries Labeling Neural Representations with Inverse Recognition summary of Labeling Neural Representations with Inverse Recognition Progress measures for grokking via mechanistic interpretability summary of Progress measures for grokking via mechanistic interpretability What do we learn from inverting CLIP models? summary of What do we learn from inverting CLIP models?