## Neural Networks for Conditional Probability Estimation: Forecasting Beyond Point PredictionsConventional applications of neural networks usually predict a single value as a function of given inputs. In forecasting, for example, a standard objective is to predict the future value of some entity of interest on the basis of a time series of past measurements or observations. Typical training schemes aim to minimise the sum of squared deviations between predicted and actual values (the 'targets'), by which, ideally, the network learns the conditional mean of the target given the input. If the underlying conditional distribution is Gaus sian or at least unimodal, this may be a satisfactory approach. However, for a multimodal distribution, the conditional mean does not capture the relevant features of the system, and the prediction performance will, in general, be very poor. This calls for a more powerful and sophisticated model, which can learn the whole conditional probability distribution. Chapter 1 demonstrates that even for a deterministic system and 'be nign' Gaussian observational noise, the conditional distribution of a future observation, conditional on a set of past observations, can become strongly skewed and multimodal. In Chapter 2, a general neural network structure for modelling conditional probability densities is derived, and it is shown that a universal approximator for this extended task requires at least two hidden layers. A training scheme is developed from a maximum likelihood approach in Chapter 3, and the performance ofthis method is demonstrated on three stochastic time series in chapters 4 and 5. |

### What people are saying - Write a review

We haven't found any reviews in the usual places.

### Contents

The Bayesian Evidence Scheme for Regularisation 147 | 18 |

A Universal Approximator Network for Predicting Condi | 21 |

A Maximum Likelihood Training Scheme | 39 |

Copyright | |

14 other sections not shown

### Common terms and phrases

adaptation algorithm applied approach approximation error arand Bayesian evidence scheme Bayesian regularisation bold line Chapter conditional probability density Consequently cross-validation error cross-validation set data points defined derived discussed double-well eigenvalues EM-steps ensemble Equation error function Gaussian generalisation error generalisation performance given GM model GM network GM-RVFL network gradient descent graphs Hessian hidden layer hyperparameters intrinsic noise kernel centres kernel widths leads learning rate linear logistic map logistic-kappa map matrix network branches network committee network parameters network training neural network nodes noise nonlinear obtained optimisation Ormoneit output weights overfitting phase transitions posterior prediction performance predictor prior prior probabilities random weights regularisation scheme RVFL S-layer Section shows sigmoid function simulations single-network singular value decomposition standard deviation state-space plot term theorem tion training data training process training scheme training set under-regularised universal approximation update weight decay weight groups weighting scheme zero