This is a collection of interesting papers (and books) that I have read so far or want to read. Note that the list is not up-to-date.
- General Deep Learning
- Conformal Prediction
- Differential Geometry in Deep Learning
- Dimensionality Reduction
- Thompson Sampling
- Deep Reinforcement Learning
- Reinforcement Learning
- Bandit Algorithms
- Optimization
- Statistics
- Probability Modeling & Inference
- Uncertainty Estimation
- Statistical Learning
- Lecture Notes, Books and Courses
- Blogs
- Schools
- 2023: Your diffusion model secretly knows the dimension of the data manifold
- 2022: Regularising Inverse Problems with Generative Machine Learning Models
- 2021: SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS
- 2021: Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck
- 2021: Slot Machines: Discovering Winning Combinations of Random Weights in Neural Networks
- 2021: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
- 2021: Why flatness correlates with generalization for Deep NN
- 2021: The Modern Mathematics of Deep Learning
- 2021: The Principles of Deep Learning Theory
- 2020: Neural tangent kernel
- 2018: Lipschitz regularity of deep neural networks: analysis and efficient estimation
- 2015: Weight Uncertainty in Neural Networks
- 1998: Efficient BackProp
- 2022: Conformal Prediction: a Unified Review of Theory and New Challenges
- 2022: Conformal Off-Policy Prediction in Contextual Bandits
- 2020: Conformal Prediction Under Covariate Shift
- 2019: Conformalized Quantile Regression
- 2005: Algorithmic Learning in a Random World
- 2020: Neural Ordinary Differential Equations on Manifolds
- 2019: Diffeomorphic Learning
- 2019: Deep ReLU network approximation of functions on a manifold
- 2019: Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds
- 2016: Deep nets for local manifold learning
- 2024: On Sparsity and Sub-Gaussianity in the Johnson-Lindenstrauss Lemma
- 2020: Stochastic Neighbor Embedding with Gaussian and Student-t Distributions: Tutorial and Survey
- 2015: Parametric nonlinear dimensionality reduction using kernel t-SNE
- 2009: Learning a Parametric Embedding by Preserving Local Structure
- 2020: A Tutorial on Thompson Sampling
- 2020: Neural Thompson Sampling
- 2018: Deep Contextual Multi-armed Bandits
- 2022: CICERO
- List of algorithms
- 2018: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
- 2017: Reinforcement Learning with Deep Energy-Based Policies
- 2020: A Theoretic Analysis of DQN
- 2020: A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
- 2019: Towards Characterizing Divergence in Deep Q-Learning
- 2022: Exploring through Random Curiosity with General Value Functions
- 2020: Planning go explore via self-supervised world models
- 2020: Hypermodels for Exploration
- 2018: DORA The Explorer: Directed Outreaching Reinforcement Action-Selection
- 2017: #Exploration: A Study of Count-Based Exploration for Deep Reinforcement
- 2016: Unifying Count-Based Exploration and Intrinsic Motivation
- 2021: Enforcing Robust Control Guarantees with Neural Network Policies
- 2018: Control-Theoretic Analysis of Smoothness for Stability-Certified Reinforcement Learning
- 2021: Mastering Atari with Discrete World-Models
- 2020: Dream to Control
- 2020: Planning to Explore via Supervised World-Models
- 2019: Learning Latent Dynamics from Pixels
- 2021: GMAC: A Distributional Perspective on Actor-Critic Framework
- 2020: Sample-Based Distributional Policy Gradient
- 2019: Statistics and Samples in Distributional Reinforcement Learning
- 2018: Distributed Distributional Deterministic Policy Gradients
- 2024: Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
- 2023: BACKSTEPPING TEMPORAL DIFFERENCE LEARNING
- 2023: Empirical Design in Reinforcement Learning
- 2023: An Analysis of Quantile Temporal-Difference Learning
- 2022: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error
- 2022: Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach
- 2021: Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
- 2021: Adaptive Sampling for Best Policy Identification in MDPs
- 2021: Learning Successor States and Goal-Dependent Values: A Mathematical Viewpoint
- 2020: Fast active learning for pure exploration in reinforcement learning
- 2020: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning
- 2019: Revisiting the Softmax Bellman Operator: New Benefits and New Perspective
- 2019: Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP
- 2019: Provably Efficient Reinforcement Learning with Linear Function Approximation
- 2018: Deep Reinforcement Learning that Matters
- 2018: Is Q-learning Provably Efficient?
- 2018: Adaptive Sampling for Policy Identification
- 2020: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
- 2016: Learning the Variance of the Reward-To-Go
- 2012: Policy Gradients with Variance Related Risk Criteria
- 2009: An Analysis of Reinforcement Learning with Function Approximation
- 2008: An Analysis of Model-Based Interval Estimation for Markov Decision Processes
- 2006: PAC Model-Free Reinforcement Learning
- 2004: Bias and Variance in Value Function Estimation
- 2001: Convergence of Optimistic and Incremental Q-Learning
- 2001: TD Algorithm for the Variance of Return and Mean-Variance Reinforcement Learning
- 2000: Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
- 1993: Convergence of Stochastic Iterative Dynamic Programming Algorithms
- 1992: Reinforcement Learning Applied to Linear Quadratic Regulation
- 1982: The Variance of Discounted Markov Decision Processes
- 2023: Does Zero-Shot Reinforcement Learning Exist?
- 2021: Learning One Representation to Optimize All Rewards
- 2017: Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
- 2024: Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
- 2022: Safety-constrained Reinforcement Learning with a Distributional Safety Critic
- 2022: Constrained Variational Policy Optimization for Safe Reinforcement Learning
- 2022: TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning
- 2022: Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk
- 2022: SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics
- 2019: Benchmarking Safe Exploration in Deep Reinforcement Learning
- 2017: Constrained Policy Optimization
- 2017: Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
- 2015: Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs
- 2015: A Comprehensive Survey on Safe Reinforcement Learning
- 2015: Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach
- 2022: A Review of Off-Policy Evaluation in Reinforcement Learning
- 2022: Conformal Off-Policy Prediction in Contextual Bandits
- 2020: CoinDICE: Off-Policy Confidence Interval Estimation
- 2018: Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
- 2015: High Confidence Policy Improvement
- 2015: High Confidence Off-Policy Evaluation
- 2000: Eligibility Traces for Off-Policy Policy Evaluation
- 2023: Quantile Bandits for Best Arms Identification
- 2020: Neural Contextual Bandits with Deep Representation and Shallow Exploration
- 2020: Neural Contextual Bandits with UCB-based Exploration
- 2016: Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks
- 2016: Optimal Best Arm Identification with Fixed Confidence
- 2016: Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
- 2011: Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems
- 2002: Finite-time Analysis of the Multiarmed Bandit Problem
- 2002: Using Confidence Bounds for Exploitation-Exploration Trade-offs
- 2002: THE NONSTOCHASTIC MULTIARMED BANDIT PROBLEM∗
- 2021: A Mean-Field Analysis of Two-Player Zero-Sum Games
- 2021: The Limits of Min-Max Optimization Algorithms: Convergence to Spurious Non-Critical Sets
- 2020: On the Convergence of Single-Call Stochastic Extra-Gradient Methods
- 2020: Non-convex Min-Max Optimization: Applications, Challenges, and Recent Theoretical Advances
- 2020: On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems
- 2020: Robust Reinforcement Learning via Adversarial Training with Langevin Dynamics
- 2018: Finding Mixed Nash Equilibria of Generative Adversarial Networks
- 2009: Subgradient Methods for Saddle-Point Problems
- 2024: Information Lower Bounds for Robust Mean Estimation
- 2022: A Short Note on an Inequality between KL and TV
- 2020: A Tutorial on Quantile Estimation via Monte Carlo
- 2019: Safe Testing
- 2012: Concentration Inequalities for Order Statistics
- 1996: Importance Sampling for Monte Carlo Estimation of Quantiles
- 1987: Better Bootstrap Confidence Intervals
- 1982: Some Methods for Testing the Homogeneity of Rainfall Records
- 2024: The Bayesian Learning Rule
- 2023: Semi-Implicit Variational Inference via Score Matching
- 2021: Normalizing Flows for Probabilistic Modeling and Inference
- 2020: Improved Techniques for Training Score-Based Generative Models
- 2019: Variational Approximations using Fisher Divergence
- 2018: Semi-Implicit Variational Inference
- 2018: Variational Inference: A Review for Statisticians
- 2017: Variational Hamiltonian Monte Carlo via Score Matching
- 2013: Stochastic Variational Inference
- 2013: Auto-Encoding Variational Bayes
- 2005: Estimation of Non-Normalized Statistical Models by Score Matching
- 2025: Deep Out-of-Distribution Uncertainty Quantification via Weight Entropy Maximization
- 2023: Epistemic Neural Networks
- 2022: Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping
- 2024: Hypothesis Testing with E-values (Book)
- 2021: Regularization in RL, Google
- CS 6789: Foundations of Reinforcement Learning
- RL Book Theory
- Reinforcement Learning: An Introduction
- Bandit Algorithms
- 2021: Lecture Notes for Statistics 311/Electrical Engineering 377
- 2015: Rademacher Complexities and VC Dimension
- 2013: An Introduction to Stochastic Approximation
- 2006: System Identification and the Limits of Learning from Data
- Deep Learning, Goodfellow et al., 2016
- The Elements of Statistical Learning, Hastie, Tibshirani, and Friedman, 2009
- Machine Learning: A Probabilistic Perspective, Murphy, 2012
- Probability Theory: The Logic of Science, E. T. Jaynes, 2003
- CS285 at UC Berkeley, Deep Reinforcement Learning
- CS234 at Stanford University, Reinforcement Learning
- 15.097 at MIT, Prediction: Machine Learning and Statistics
- 2008: Graphical Models, Exponential Families, and Variational Inference, Wainwright and Jordan
- Deep Reinforcement Learning Doesn't Work Yet
- Distill's Publication on Feature Visualization
- Lil'Log: Blog on Machine Learning