Discriminative Learning for Speech Recognition: Theory and Practice
In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-function form. This common form enables the use of the growth transformation (or extended Baum Welch) optimization framework in discriminative learning of model parameters. In addition to all the necessary introduction of the background and tutorial material on the subject, we also included technical details on the derivation of the parameter optimization formulas for exponential-family distributions, discrete hidden Markov models (HMMs), and continuous-density HMMs in discriminative learning. Selected experimental results obtained by the authors in firsthand are presented to show that discriminative learning can lead to superior speech recognition performance over conventional parameter learning. Details on major algorithmic implementation issues with practical significance are provided to enable the practitioners to directly reproduce the theory in the earlier part of the book into engineering practice."
What people are saying - Write a review
We haven't found any reviews in the usual places.
Other editions - View all
acoustic model algorithm arc q arg max auxiliary function average accuracy count C(sr CDHMM Chapter compute Dg(i constraints construct contains arc criterion defined denote derive discrete HMM discriminative learning discriminative training discussed emitting probability estimation formulas exponential family exponential-family distributions function O(L gradient descent growth transformation GT formulas GT technique GT-based hidden Markov model iteration language model lattice loss function Markov model HMM matrix MCE training MPE/MWE criteria multinomial distribution multivariate Gaussian distribution mutual information N-best ŅO(L number of terms objective function observation sequence observation vector obtain optimization techniques parameter estimation parameter learning parameter set performance optimization possible label sequences posterior probability precedes q rational function rational-function form raw phone reverse Jensen’s inequality sentence token simplification statistical sublattice substring succeeds q summation superstring training data training set training tokens transition probability unified form word sequence Xr,t