> This post covers one appealing way to constrain the weight matrices of a neura...

jpt4 · 2025-09-26T20:15:32 1758917732

\> statistical learning theory does not adequately model the macro-behavior of very large models.

Might you please elaborate on this? I recognize that "artificial neural networks are lossy de/compression algorithms" does not enumerate the nuances of these structures, but am curious whether anything in particular is both interesting and missing from SLT.

esafak · 2025-09-26T20:45:50 1758919550

SLT typically uses empirical risk minimization, leading to the bias-variance decomposition and a unimodal extremum as the monotonically decreasing bias supposedly balances against the monotonically increasing variance. We now know this does not accurately model overparameterized models, which exhibit double descent, and other phenomena like grokking. To explain them you have to look past classical statistics to statistical mechanics.

namibj · 2025-09-26T20:01:11 1758916871

> The test accuracy in figure 6b shows a marginal increase, and a gentler transition to the overfitting regime, suggesting the regularization is working.

Sounds like it might help for online RL training regimes as those are naturally quite vulnerable to overfitting .

p1esk · 2025-09-26T20:34:19 1758918859

The test accuracy in figure 6b shows a marginal increase, and a gentler transition to the overfitting regime, suggesting the regularization is working.

Higher LR does not mean there’s overfitting.