### LaTeX in comments

Include`[latexpage]`to render LaTeX in comments. Basic HTML works too. (More.)### Recent Comments

**AI goalpost moving is not unreasonable (2)****Unital dynamics are mixedness increasing (1)****A checkable Lindbladian condition (2)****How to think about Quantum Mechanics—Part 1: Measurements are about bases (15)****Consistency conditions in consistent histories (3)****Lindblad operator trace is 1st-order contribution to Hamiltonian part of reduced dynamics (2)****Weingarten's branches from quantum complexity (1)**

### Recent Posts

- Comments on Ollivier’s “Emergence of Objectivity for Quantum Many-Body Systems”November 27, 2023
- Compact precise definition of a transformer functionOctober 9, 2023
- Unital dynamics are mixedness increasingSeptember 27, 2023
- AI goalpost moving is not unreasonableJuly 21, 2023
- Notable reviews of arguments for AGI ruinApril 24, 2023
- Table of proposed macroscopic superpositionsApril 21, 2022
- GPT-3, PaLM, and look-up tablesApril 7, 2022
- Lindblad operator trace is 1st-order contribution to Hamiltonian part of reduced dynamicsMarch 14, 2022
- Weingarten’s branches from quantum complexityMarch 9, 2022
- How long-range coherence is encoded in the Weyl quasicharacteristic functionMay 18, 2021

- Comments on Ollivier’s “Emergence of Objectivity for Quantum Many-Body Systems”
### Categories

### Archives

### Meta

### Licence

For maximum flexibility, foreXiv by C. Jess Riedel is multi-licensed separately under CC BY-SA 4.0, CC BY-NC-SA 4.0, and GFDL 1.3.### Math & Physics Blogs

- Azimuth
- Backreaction (Sabine Hossenfelder)
- Information Processing (Stephen Hsu)
- IQOQI-Vienna Blog
- Matt Leifer
- Not Even Wrong (Peter Woit)
- Preposterous Universe (Sean Carroll)
- Quanta Magazine
- Quantum Diaries Survivor
- Quantum Frontiers
- Quantum Pontiff
- Reference Frame (Luboš Motl)
- Résonaances (Adam Falkowski)
- Schroedinger's rat (Miguel Navascues)
- SciRate
- Shtetl-Optimized (Scott Aaronson)
- Terence Tao
- The Morning Paper
- Tobias Osborne

### Other Blogs & Links

- 80,000 Hours
- Aeon
- AI Impacts
- Arts & Letters Daily
- ArXiv.org Blog
- ChinAI Newsletter
- Construction Physics (Brian Potter)
- EconLog
- Economist's View
- Effective Altruism Forum
- Future Primaeval
- GiveWell Blog
- Giving Gladly (Julia Wise)
- Giving What We Can
- Good Ventures: Give & Learn
- Gwern
- HackerNews
- Jeff Kaufman
- Luke Muehlhauser
- Manifold (Stephen Hsu)
- Marginal Revolution (Cowen & Tabarrok)
- MIRI Blog
- Money Stuff (Matt Levine)
- Noahpinion (Noah Smith)
- Open Philanthropy Blog
- Orbital Index
- Otium
- Overcoming Bias (Robin Hanson)
- Philip Trammell
- Reflective Disequilibrium (Carl Shulman)
- Scholar's Stage
- SCOTUS Blog
- Slate Star Codex (S. Alexander)

### Podcasts

## Tishby on physics and deep learning

Having heard Geoffrey Hinton’s somewhat dismissive account of the contribution by physicists to machine learning in his online MOOC, it was interesting to listen to one of those physicists, Naftali Tishby, here at PI:

The Information Theory of Deep Neural Networks: The statistical physics aspectsNaftali TishbyAbstract:The surprising success of learning with deep neural networks poses two fundamental challenges: understanding why these networks work so well and what this success tells us about the nature of intelligence and our biological brain. Our recent Information Theory of Deep Learning shows that large deep networks achieve the optimal tradeoff between training size and accuracy, and that this optimality is achieved through the noise in the learning process.

In this talk, I will focus on the statistical physics aspects of our theory and the interaction between the stochastic dynamics of the training algorithm (Stochastic Gradient Descent) and the phase structure of the Information Bottleneck problem. Specifically, I will describe the connections between the phase transition and the final location and representation of the hidden layers, and the role of these phase transitions in determining the weights of the network.

Based partly on joint works with Ravid Shwartz-Ziv, Noga Zaslavsky, and Shlomi Agmon.

(See also Steve Hsu’s discussion of a similar talk Tishby gave in Berlin, plus other notes on history.)

I was familiar with the general concept of over-fitting, but I hadn’t realized you could talk about it quantitatively by looking at the mutual information between the output of a network and all the information in the training data that

isn’tthe target label.One often hears the refrain that a lot of ML techniques were known for decades but only became useful when big computational power and huge datasets arrived relatively recently. The unreasonable effectiveness of data is often described as a surprise, but Tishby claims that (part of?) this was predicted by the physicists based on large-N limits of statistical mechanics models, but that this was ignored by the computer scientists. I don’t know near enough about this topic to assess.

He clearly has a chip on his shoulder — which naturally makes me like him. His “information bottleneck” paper with Pereira and Bialek was posted to the arXiv in 2000 and apparently rejected by the major CS conferences, but has since accumulated fourteen hundred citations.