## Neural Networks: A Systematic IntroductionNeural networks are a computing paradigm that is finding increasing attention among computer scientists. In this book, theoretical laws and models previously scattered in the literature are brought together into a general theory of artificial neural nets. Always with a view to biology and starting with the simplest nets, it is shown how the properties of models change when more general computing elements and net topologies are introduced. Each chapter contains examples, numerous illustrations, and a bibliography. The book is aimed at readers who seek an overview of the field or who wish to deepen their knowledge. It is suitable as a basis for university courses in neurocomputing. |

### What people are saying - Write a review

User Review - Flag as inappropriate

download it from the authors website at: http://page.mi.fu-berlin.de/rojas/neural/

### Contents

1 The Biological Paradigm | 3 |

112 Models of computation | 5 |

113 Elements of a computing model | 9 |

122 Transmission of information | 11 |

123 Information processing at the neurons and synapses | 18 |

124 Storage of information learning | 20 |

125 The neuron a selforganizing system | 21 |

13 Artificial neural networks | 23 |

93 Classification networks | 245 |

NETtalk | 246 |

932 The Bayes property of classifier networks | 247 |

933 Connectionist speech recognition | 250 |

934 Autoregressive models for time series analysis | 258 |

94 Historical and bibliographical remarks | 259 |

10 The Complexity of Learning | 263 |

1013 Kolmogorovs theorem | 265 |

132 Approximation of functions | 24 |

133 Caveat | 26 |

2 Threshold Logic | 29 |

212 The computing units | 31 |

22 Synthesis of Boolean functions | 33 |

222 Geometric interpretation | 34 |

223 Constructive synthesis | 36 |

23 Equivalent networks | 38 |

231 Weighted and unweighted networks | 39 |

232 Absolute and relative inhibition | 40 |

233 Binary signals and pulse coding | 41 |

24 Recurrent networks | 42 |

242 Finite automata | 43 |

243 Finite automata and recurrent networks | 44 |

244 A first classification of neural networks | 46 |

25 Harmonic analysis of logical functions | 47 |

252 The HadamardWalsh transform | 49 |

253 Applications of threshold logic | 50 |

26 Historical and bibliographical remarks | 52 |

3 Weighted Networks The Perceptron | 55 |

312 Computational limits of the perceptron model | 57 |

32 Implementation of logical functions | 60 |

322 The XOR problem | 62 |

33 Linearly separable functions | 63 |

332 Duality of input space and weight space | 64 |

333 The error function in weight space | 65 |

34 Applications and biological analogy | 66 |

342 The structure of the retina | 68 |

343 Pyramidal networks and the neocognitron | 69 |

344 The silicon retina | 74 |

35 Historical and bibliographical remarks | 75 |

4 Perceptron Learning | 77 |

411 Classes of learning algorithms | 78 |

412 Vector notation | 79 |

413 Absolute linear separability | 80 |

414 The error surface and the search method | 81 |

42 Algorithmic learning | 84 |

421 Geometric visualization | 85 |

422 Convergence of the algorithm | 87 |

423 Accelerating convergence | 89 |

424 The pocket algorithm | 90 |

425 Complexity of perceptron learning | 91 |

43 Linear programming | 92 |

432 Linear separability as linear optimization | 94 |

433 Karmarkars algorithm | 95 |

44 Historical and bibliographical remarks | 97 |

5 Unsupervised Learning and Clustering Algorithms | 99 |

512 Unsupervised learning through competition | 101 |

52 Convergence analysis | 103 |

522 Multidimensional case the classical methods | 106 |

523 Unsupervised learning as minimization problem | 108 |

524 Stability of the solutions | 110 |

53 Principal component analysis | 112 |

532 Convergence of the learning algorithm | 115 |

533 Multiple principal components | 117 |

541 Pattern recognition | 118 |

55 Historical and bibliographical remarks | 120 |

6 One and Two Layered Networks | 123 |

612 The XOR problem revisited | 124 |

613 Geometric visualization | 127 |

62 Counting regions in input and weight space | 129 |

622 Bipolar vectors | 131 |

623 Projection of the solution regions | 132 |

624 Geometric interpretation | 135 |

63 Regions for two layered networks | 138 |

632 Number of regions in general | 139 |

633 Consequences | 142 |

635 The problem of local minima | 145 |

64 Historical and bibliographical remarks | 147 |

7 The Backpropagation Algorithm | 149 |

712 Regions in input space | 151 |

713 Local minima of the error function | 152 |

72 General feedforward networks | 153 |

722 Derivatives of network functions | 155 |

723 Steps of the backpropagation algorithm | 159 |

724 Learning with backpropagation | 161 |

73 The case of layered networks | 162 |

732 Steps of the algorithm | 164 |

733 Backpropagation in matrix form | 168 |

734 The locality of backpropagation | 169 |

735 Error during training | 171 |

741 Backpropagation through time | 172 |

742 Hidden Markov Models | 175 |

743 Variational problems | 178 |

75 Historical and bibliographical remarks | 180 |

8 Fast Learning Algorithms | 183 |

811 Backpropagation with momentum | 184 |

812 The fractal geometry of backpropagation | 190 |

82 Some simple improvements to backpropagation | 197 |

822 Clipped derivatives and offset term | 199 |

823 Reducing the number of floatingpoint operations | 200 |

824 Data decorrelation | 202 |

83 Adaptive step algorithms | 204 |

831 Silva and Almeidas algorithm | 205 |

832 Deltabardelta | 207 |

833 Rprop | 208 |

834 The Dynamic Adaption algorithm | 209 |

84 Secondorder algorithms | 210 |

841 Quickprop | 211 |

842 QRprop | 212 |

843 Secondorder backpropagation | 214 |

85 Relaxation methods | 221 |

852 Symmetric and asymmetric relaxation | 222 |

853 A final thought on taxonomy | 223 |

86 Historical and bibliographical remarks | 224 |

9 Statistics and Neural Networks 91 Linear and nonlinear regression | 227 |

912 Linear regression | 229 |

913 Nonlinear units | 231 |

914 Computing the prediction error | 233 |

915 The jackknife and crossvalidation | 236 |

916 Committees of networks | 237 |

92 Multiple regression | 240 |

922 Linear equations and the pseudoinverse | 242 |

923 The hidden layer | 243 |

924 Computation of the pseudoinverse | 244 |

102 Function approximation | 267 |

1022 The multidimensional case | 269 |

103 Complexity of learning problems | 271 |

1031 Complexity classes | 272 |

1032 ATPcomplete learning problems | 275 |

1033 Complexity of learning with ANDOR networks | 277 |

1034 Simplifications of the network architecture | 280 |

1035 Learning with hints | 281 |

104 Historical and bibliographical remarks | 284 |

11 Fuzzy Logic | 287 |

1112 The fuzzy set concept | 288 |

1113 Geometric representation of fuzzy sets | 290 |

1114 Fuzzy set theory logic operators and geometry | 294 |

1115 Families of fuzzy operators | 295 |

112 Fuzzy inferences | 299 |

1122 Fuzzy numbers and inverse operation | 300 |

113 Control with fuzzy logic | 302 |

1132 Fuzzy networks | 303 |

1133 Function approximation with fuzzy methods | 305 |

1134 The eye as a fuzzy system color vision | 306 |

114 Historical and bibliographical remarks | 307 |

12 Associative Networks 121 Associative pattern recognition | 309 |

1212 Structure of an associative memory | 311 |

1213 The eigenvector automaton | 312 |

122 Associative learning | 314 |

1222 Geometric interpretation of Hebbian learning | 317 |

1223 Networks as dynamical systems some experiments | 318 |

1224 Another visualization | 322 |

123 The capacity problem | 323 |

124 The pseudoinverse | 324 |

1241 Definition and properties of the pseudoinverse | 325 |

1242 Orthogonal projections | 327 |

1243 Holographic memories | 330 |

1244 Translation invariant pattern recognition | 331 |

125 Historical and bibliographical remarks | 333 |

13 The Hopfield Model | 335 |

1312 The bidirectional associative memory | 336 |

1313 The energy function | 338 |

132 Definition of Hopfield networks | 339 |

1322 Examples of the model | 341 |

1323 Isomorphism between the Hopfield and Ising models | 346 |

133 Converge to stable states | 347 |

1332 Convergence proof | 348 |

1333 Hebbian learning | 352 |

134 Equivalence of Hopfield and perception learning | 354 |

1342 Complexity of learning in Hopfield models | 356 |

135 Parallel combinatorics | 357 |

1353 The eight rooks problem | 359 |

1354 The eight queens problem | 360 |

1355 The traveling salesman | 361 |

1356 The limits of Hopfield networks | 363 |

136 Implementation of Hopfield networks | 365 |

1362 Optical implementation | 366 |

137 Historical and bibliographical remarks | 368 |

14 Stochastic Networks | 371 |

1411 The continuous model | 372 |

142 Stochastic systems | 373 |

1421 Simulated annealing | 374 |

1422 Stochastic neural networks | 375 |

1423 Markov chains | 376 |

1424 The Boltzmann distribution | 379 |

1425 Physical meaning of the Boltzmann distribution | 382 |

143 Learning algorithms and applications | 383 |

1432 Combinatorial optimization | 385 |

144 Historical and bibliographical remarks | 386 |

15 Kohonen Networks | 389 |

1512 Topology preserving maps in the brain | 390 |

152 Kohonens model | 393 |

1522 Mapping highdimensional spaces | 397 |

153 Analysis of convergence | 399 |

1532 The twodimensional case | 401 |

1533 Effect of a units neighborhood | 402 |

1534 Metastable states | 403 |

1535 What dimension for Kohonen networks? | 405 |

154 Applications | 406 |

1542 Inverse kinematics | 407 |

155 Historical and bibliographical remarks | 409 |

16 Modular Neural Networks | 411 |

1611 Cascade correlation | 412 |

1612 Optimal modules and mixtures of experts | 413 |

162 Hybrid networks | 414 |

1622 Maximum entropy | 417 |

1623 Counterpropagation networks | 418 |

1624 Spline networks | 420 |

1625 Radial basis functions | 422 |

163 Historical and bibliographical remarks | 424 |

17 Genetic Algorithms | 427 |

1712 Methods of stochastic optimization | 428 |

1713 Genetic coding | 431 |

1714 Information exchange with genetic operators | 432 |

172 Properties of genetic algorithms | 433 |

1722 Deceptive problems | 437 |

1723 Genetic drift | 438 |

1724 Gradient methods versus genetic algorithms | 440 |

173 Neural networks and genetic algorithms | 441 |

1732 A numerical experiment | 443 |

1733 Other applications of GAs | 444 |

174 Historical and bibliographical remarks | 446 |

18 Hardware for Neural Networks | 449 |

1812 Types of neurocomputers | 451 |

182 Analog neural networks | 452 |

1821 Coding | 453 |

1822 VLSI transistor circuits | 454 |

1823 Transistors with stored charge | 456 |

1824 CCD components | 457 |

183 Digital networks | 459 |

1832 Vector and signal processors | 460 |

1833 Systolic arrays | 461 |

1834 Onedimensional structures | 463 |

184 Innovative computer architectures | 466 |

1842 Optical computers | 469 |

1843 Pulse coded networks | 472 |

185 Historical and bibliographical remarks | 474 |

### Common terms and phrases

activation function approximation artificial neural networks Assume backpropagation backpropagation algorithm bibliographical remarks binary Boltzmann Boltzmann machine Boolean cell clusters coding complexity components computing units connections convergence corresponds defined denote distribution edges elements energy function equation error function example feed-forward finite function f fuzzy set genetic algorithms given gradient descent Hebbian learning hidden layer hidden units Hopfield networks hyperplanes implemented input space input vector iteration kind Kohonen network learning algorithm learning problem linear associator linear separation logical functions McCulloch-Pitts method minimize minimum n-dimensional network function neurons node nonlinear operators optimal output unit parameters partial derivative patterns perceptron perceptron learning phoneme points polynomial possible probability processor produce pseudoinverse randomly result selected shown in Figure shows sigmoid signals solution regions stochastic strings systolic array threshold training set transform transmitted two-dimensional unsupervised learning update variables visual weight matrix weight space weight vector zero

### Popular passages

Page 491 - Prediction Risk and Architecture Selection for Neural Networks", in: V. Cherkassky, JH Friedman, and H. Wechsler (eds.), From Statistics to Neural Networks: Theory and Pattern Recognition Applications, NATO ASI Series F, Vol.