-
Benefits of Early Stopping in
Gradient Descent for Overparameterized Logistic Regression
JW, Peter Bartlett†, Matus
Telgarsky†, Bin Yu†
arXiv 2025 |
slides
-
Implicit Bias of Gradient Descent for
Non-Homogeneous Deep Networks
Yuhang Cai*, Kangjie Zhou*, JW, Song Mei, Michael
Lindsey, Peter Bartlett
arXiv 2025
-
How Does Critical Batch Size Scale in
Pre-training?
Hanlin Zhang, Depen Morwani, Nikhil Vyas, JW, Difan Zou,
Udaya Ghai, Dean Foster, Sham Kakade
ICLR 2025
-
Context-Scaling versus Task-Scaling in
In-Context Learning
Amirhesam Abedsoltan, Adityanarayanan Radhakrishnan, JW,
Mikhail Belkin
arXiv 2024
-
Large Stepsize Gradient Descent for
Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
Yuhang Cai, JW, Song Mei, Michael Lindsey, Peter
Bartlett
NeurIPS 2024
-
Scaling Laws in Linear Regression:
Compute, Parameters, and Data
Licong Lin, JW, Sham Kakade, Peter Bartlett, Jason
Lee
NeurIPS 2024
-
In-Context Learning of a Linear
Transformer Block: Benefits of the MLP Component and One-Step GD Initialization
Ruiqi Zhang, JW, Peter Bartlett
NeurIPS 2024
-
Large Stepsize Gradient Descent for
Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency
JW, Peter Bartlett†, Matus
Telgarsky†, Bin Yu†
COLT 2024 |
slides |
poster
-
How Many Pretraining Tasks Are Needed
for
In-Context Learning of Linear Regression?
JW, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan
Gu,
Peter Bartlett
ICLR 2024 (spotlight) |
slides |
poster
-
Risk Bounds of Accelerated SGD for
Overparameterized Linear Regression
Xuheng Li, Yihe Deng, JW, Dongruo Zhou, Quanquan
Gu
ICLR 2024
-
Implicit Bias of Gradient Descent for
Logistic Regression at the Edge of Stability
JW, Vladimir Braverman, Jason Lee
NeurIPS 2023 (spotlight) |
slides |
poster
-
Private Federated Frequency
Estimation:
Adapting to the Hardness of the Instance
JW, Wennan Zhu, Peter Kairouz, Vladimir
Braverman
NeurIPS 2023 |
slides |
poster
-
Fixed Design Analysis of
Regularization-Based Continual Learning
Haoran Li*, JW*, Vladimir Braverman
CoLLAs 2023
-
Finite-Sample Analysis of Learning
High-Dimensional Single ReLU Neuron
JW*, Difan Zou*, Zixiang Chen*, Vladimir Braverman,
Quanquan Gu, Sham Kakade
ICML 2023 |
slides |
poster
-
The Power and Limitation of Pretraining-Finetuning for Linear Regression under
Covariate Shift
JW*, Difan Zou*, Vladimir Braverman, Quanquan Gu, Sham
Kakade
NeurIPS 2022 |
slides |
poster
-
Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime
Difan Zou*, JW*, Vladimir Braverman, Quanquan Gu, Sham Kakade
NeurIPS 2022
-
Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized
Linear
Regression
JW*, Difan Zou*, Vladimir Braverman, Quanquan Gu, Sham Kakade
ICML 2022 (long presentation) |
slides |
poster
-
Gap-Dependent Unsupervised Exploration for Reinforcement Learning
JW, Vladimir Braverman, Lin Yang
AISTATS 2022 |
slides |
poster
-
The Benefits of Implicit Regularization from SGD in Least Squares Problems
Difan Zou*, JW*, Vladimir Braverman, Quanquan Gu, Dean Foster,
Sham Kakade
NeurIPS 2021
-
Accommodating Picky Customers: Regret Bound and Exploration Complexity for
Multi-Objective Reinforcement Learning
JW, Vladimir Braverman, Lin Yang
NeurIPS 2021 |
slides | poster
-
Lifelong Learning with Sketched Structural Regularization
Haoran Li, Aditya Krishnan, JW, Soheil Kolouri, Praveen Pilly, Vladimir
Braverman
ACML 2021
-
Benign Overfitting of Constant-Stepsize SGD for Linear Regression
Difan Zou*, JW*, Vladimir Braverman, Quanquan Gu, Sham Kakade
COLT 2021 (journal version in JMLR 2023) |
slides
-
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with
Moderate Learning Rate
JW, Difan Zou, Vladimir Braverman, Quanquan Gu
ICLR 2021 |
slides |
poster
-
Obtaining Adjustable Regularization for Free via Iterate Averaging
JW, Vladimir Braverman, Lin Yang
ICML 2020 |
slides | code
-
On the Noisy Gradient Descent that Generalizes as SGD
JW, Wenqing Hu, Haoyi Xiong, Jun Huan, Vladimir Braverman, Zhanxing
Zhu
ICML 2020 |
slides |
code
-
Tangent-Normal Adversarial Regularization for Semi-supervised Learning
Bing Yu*, JW*, Jinwen Ma, Zhanxing Zhu
CVPR 2019 (oral) |
slides |
poster |
code
-
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping
from
Minima and Regularization Effects
Zhanxing Zhu*, JW*, Bing Yu, Lei Wu,
Jinwen Ma
ICML 2019 |
slides |
poster |
code