## Statistical Mechanics of LearningLearning is one of the things that humans do naturally, and it has always been a challenge for us to understand the process. Nowadays this challenge has another dimension as we try to build machines that are able to learn and to undertake tasks such as datamining, image processing and pattern recognition. We can formulate a simple framework, artificial neural networks, in which learning from examples may be described and understood. The contribution to this subject made over the last decade by researchers applying the techniques of statistical mechanics is the subject of this book. The authors provide a coherent account of various important concepts and techniques that are currently only found scattered in papers, supplement this with background material in mathematics and physics and include many examples and exercises to make a book that can be used with courses, or for self-teaching, or as a handy reference. |

### What people are saying - Write a review

We haven't found any reviews in the usual places.

### Contents

Getting Started | 3 |

12 A simple example | 6 |

13 General setup | 10 |

14 Problems | 15 |

Perceptron Learning Basics | 16 |

22 The annealed approximation | 20 |

23 The Gardner analysis | 24 |

24 Summary | 29 |

93 Optimal online learning | 157 |

94 Perceptron with a smooth transfer function | 161 |

95 Queries | 162 |

96 Unsupervised online learning | 167 |

97 The natural gradient | 171 |

98 Discussion | 172 |

99 Problems | 173 |

Making Contact with Statistics | 178 |

25 Problems | 31 |

A Choice of Learning Rules | 35 |

32 The perceptron rule | 38 |

33 The pseudoinverse rule | 39 |

34 The adaline rule | 41 |

35 Maximal stability | 42 |

36 The Bayes rule | 44 |

37 Summary | 48 |

Augmented Statistical Mechanics Formulation | 51 |

42 Gibbs learning at nonzero temperature | 54 |

43 General statistical mechanics formulation | 58 |

44 Learning rules revisited | 61 |

45 The optimal potential | 65 |

46 Summary | 66 |

47 Problems | 67 |

Noisy Teachers | 71 |

52 Trying perfect learning | 74 |

53 Learning with errors | 80 |

54 Refinements | 82 |

55 Summary | 84 |

56 Problems | 85 |

The Storage Problem | 87 |

the Cover analysis | 91 |

the Ising perceptron | 95 |

64 The distribution of stabilities | 100 |

65 Beyond the storage capacity | 104 |

66 Problems | 106 |

Discontinuous Learning | 111 |

72 The Ising perceptron | 113 |

73 The reversed wedge perceptron | 116 |

74 The dynamics of discontinuous learning | 120 |

75 Summary | 123 |

76 Problems | 124 |

Unsupervised Learning | 127 |

82 The deceptions of randomness | 131 |

83 Learning a symmetrybreaking direction | 135 |

84 Clustering through competitive learning | 139 |

85 Clustering by tuning the temperature | 144 |

87 Problems | 149 |

Online Learning | 151 |

92 Specific examples | 154 |

102 Sauers lemma | 180 |

103 The VapnikChervonenkis theorem | 182 |

104 Comparison with statistical mechanics | 184 |

105 The CramérRao inequality | 188 |

106 Discussion | 191 |

107 Problems | 192 |

A Birds Eye View Multifractals | 195 |

112 The multifractal spectrum of the perceptron | 197 |

113 The multifractal organization of internal representations | 205 |

114 Discussion | 209 |

Multilayer Networks | 211 |

121 Basic architectures | 212 |

122 Bounds | 216 |

123 The storage problem | 220 |

124 Generalization with a parity tree | 224 |

125 Generalization with a committee tree | 227 |

126 The fully connected committee machine | 230 |

127 Summary | 232 |

128 Problems | 234 |

Online Learning in Multilayer Networks | 239 |

132 The parity tree | 245 |

133 Soft committee machine | 248 |

134 Backpropagation | 253 |

135 Bayesian online learning | 255 |

136 Discussion | 257 |

137 Problems | 258 |

What Else? | 261 |

142 Complex optimization | 265 |

143 Errorcorrecting codes | 268 |

144 Game theory | 272 |

Appendices | 277 |

A2 The Gardner Analysis | 284 |

A3 Convergence of the Perceptron Rule | 291 |

A4 Stability of the Replica Symmetric Saddle Point | 293 |

A5 Onestep Replica Symmetry Breaking | 302 |

A6 The Cavity Approach | 306 |

A7 The VC theorem | 312 |

315 | |

329 | |

### Other editions - View all

### Common terms and phrases

adatron algorithm analysis annealed annealed approximation ansatz architecture asymptotic behaviour average Bayes Boolean function bound calculation cells chapter characterized classification cluster committee machine Consider convergence corresponding cost function coupling vector entropy error-correcting codes exponentially free energy Gaussian Gibbs learning given Hebb rule hence hidden units input integral internal representations introduced Ising perceptron learning from examples learning rules matrix maximal minimal multifractal spectrum multilayer networks N-sphere neural networks neurons obtained off-line on-line learning optimal order parameters output noise overlap parity machine parity tree perceptron learning perceptron rule performance Phys probability distribution quenched entropy random variables realize replica symmetry breaking replica trick result reversed wedge perceptron saddle point equations self-averaging Show simple ſº stability statistical mechanics storage capacity storage problem student vector teacher—student thermodynamic limit training error training set typical unsupervised unsupervised learning VC dimension version space zero

### Popular passages

Page 320 - K. Rose, E. Gurewitz, and GC Fox, "Statistical mechanics and phase transitions in clustering," Physical Review Letters, vol.

Page 314 - CT(I>~) and cr(v~) are the frequencies resulting in the two subsamples considered after permuting the examples of the whole sample. Note that the composition of each subsample can be modified by the permutation. It turns out that the quantity can be bounded for all the possible outcomes. We will consider two different bounds for F. The first one is valid for p — p...