-
Memory-Statistics Tradeoff in Continual Learning with Structural Regularization
arXiv 2025 -
Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression
arXiv 2025 | slides -
Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks
arXiv 2025 -
How Does Critical Batch Size Scale in Pre-training?
ICLR 2025 -
Context-Scaling versus Task-Scaling in In-Context Learning
arXiv 2024 -
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
NeurIPS 2024 -
Scaling Laws in Linear Regression: Compute, Parameters, and Data
NeurIPS 2024 -
In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization
NeurIPS 2024 -
Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency
COLT 2024 | slides | poster -
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
ICLR 2024 (spotlight) | slides | poster -
Risk Bounds of Accelerated SGD for Overparameterized Linear Regression
ICLR 2024 -
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
NeurIPS 2023 (spotlight) | slides | poster -
Private Federated Frequency Estimation: Adapting to the Hardness of the Instance
NeurIPS 2023 | slides | poster -
Fixed Design Analysis of Regularization-Based Continual Learning
CoLLAs 2023 -
Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron
ICML 2023 | slides | poster -
The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift
NeurIPS 2022 | slides | poster -
Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime
NeurIPS 2022 -
Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression
ICML 2022 (long presentation) | slides | poster -
Gap-Dependent Unsupervised Exploration for Reinforcement Learning
AISTATS 2022 | slides | poster -
The Benefits of Implicit Regularization from SGD in Least Squares Problems
NeurIPS 2021 -
Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning
NeurIPS 2021 | slides | poster -
Lifelong Learning with Sketched Structural Regularization
ACML 2021 -
Benign Overfitting of Constant-Stepsize SGD for Linear Regression
COLT 2021 (journal version in JMLR 2023) | slides -
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate
ICLR 2021 | slides | poster -
Obtaining Adjustable Regularization for Free via Iterate Averaging
ICML 2020 | slides | code -
On the Noisy Gradient Descent that Generalizes as SGD
ICML 2020 | slides | code -
Tangent-Normal Adversarial Regularization for Semi-supervised Learning
CVPR 2019 (oral) | slides | poster | code -
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Minima and Regularization Effects
ICML 2019 | slides | poster | code