Acoustical and Environmental Robustness in Automatic Speech Recognition

Front Cover
Springer Science & Business Media, Nov 30, 1992 - Technology & Engineering - 186 pages
0 Reviews
The need for automatic speech recognition systems to be robust with respect to changes in their acoustical environment has become more widely appreciated in recent years, as more systems are finding their way into practical applications. Although the issue of environmental robustness has received only a small fraction of the attention devoted to speaker independence, even speech recognition systems that are designed to be speaker independent frequently perform very poorly when they are tested using a different type of microphone or acoustical environment from the one with which they were trained. The use of microphones other than a "close talking" headset also tends to severely degrade speech recognition -performance. Even in relatively quiet office environments, speech is degraded by additive noise from fans, slamming doors, and other conversations, as well as by the effects of unknown linear filtering arising reverberation from surface reflections in a room, or spectral shaping by microphones or the vocal tracts of individual speakers. Speech-recognition systems designed for long-distance telephone lines, or applications deployed in more adverse acoustical environments such as motor vehicles, factory floors, oroutdoors demand far greaterdegrees ofenvironmental robustness. There are several different ways of building acoustical robustness into speech recognition systems. Arrays of microphones can be used to develop a directionally-sensitive system that resists intelference from competing talkers and other noise sources that are spatially separated from the source of the desired speech signal.
 

What people are saying - Write a review

We haven't found any reviews in the usual places.

Contents

Introduction
1
11 Acoustical Environmental Variability and its Consequences
2
112 Additive Noise
3
114 Physiological Differences
4
12 Previous Research in Signal Processing for Robust Speech Recognition
5
122 Techniques Based on Manipulation of Distortion Measures
6
123 The Use of Auditory Models
7
124 Techniques Based on ShortTime Spectral Amplitude Estimation
8
The MMSEN Algorithm
69
The SDCN Algorithm
74
44 Summary
78
The CDCN Algorithm
81
51 Introduction to the CDCN Algorithm
83
511 ML Estimation Based on Acoustic Information
85
52 MMSE Estimator of the Cepstral Vector
86
53 ML Estimation of Noise and Spectral Tilt
88

125 Techniques Based on Mixture Densities
9
126 Other Techniques
11
127 Discussion
12
13 Towards EnvironmentIndependent Recognition
13
A Unified View
14
133 Measuring Performance Evaluation
15
Experimental Procedure
17
212 Vector Quantization
19
213 Hidden Markov Models
20
22 The Census Database
21
222 Database Contents
22
224 The Environment
23
23 Objective Measurements
24
SEGSNR and MAXSNR
25
Average Speech and Noise Spectra
26
235 Discussion of SNR Measures
28
24 Baseline Recognition Accuracy
30
25 Other Databases
31
251 Sennheiser HMD224 Crown PCC160
32
253 Sennheiser HMD224 Sennheiser 518
34
254 Sennheiser HMD224Sennheiser ME80
36
Frequency Domain Processing
39
32 Channel Equalization
41
33 Noise Suppression by Spectral Subtraction
42
332 Noise Subtraction in Speech Recognition
45
333 Spectra Subtraction in the Logarithm Domain
46
34 Experiments with Sphinx
48
341 EQUAL Algorithm
49
342 PSUB Algorithm
52
343 MSUB Algorithm
54
344 MMSE1 Algorithm
58
345 Cascade of EQUAL and MSUB
61
35 Summary
65
The SDCN Algorithm
67
54 Implementation Details
90
55 Summary of the CDCN Algorithm
93
56 Evaluation Results
94
57 Summary
96
Other Algorithms
101
62 The BSDCN Algorithm
104
63 The FCDCN Algorithm
108
631 ML Estimation of the Correction Vectors
109
64 Environmental Adaptation in Real Time
116
65 Summary
117
Frequency Normalization
121
72 Improved Frequency Resolution
123
73 Variable Frequency Warping
126
74 Summary
129
Summary of Results
131
Conclusions
137
92 Suggestions for Future Work
139
Glossary
143
14 Indices
144
Signal Processing in Sphinx
145
The Bilinear Transform
149
III1 Cascade of Warping Stages
150
Spectral Estimation Issues
153
MMSE Estimation in the CDCN Algorithm
155
V2 Gaussian Decomposition in CDCN
156
V3 MMSE Estimate in CDCN
158
Maximum Likelihood via the EM Algorithm
161
VI2 The EM Algorithm
162
ML Estimation of Noise and Spectral Tilt
165
Vocabulary ana Pronunciation Dictionary
169
References
173
Index
185
Copyright

Other editions - View all

Common terms and phrases

Popular passages

Page 181 - JE Porter and SF Boll. Optimal Estimators for Spectral Restoration of Noisy Speech.
Page 179 - All-pole modeling of degraded speech,
Page 180 - Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, New York, NY, pages 2 1 5-2 1 8, 1 988. [7] SF Boll. "Suprcssion of Acoustic Noise in Speech Using Spectral Subtraction.".
Page 174 - D. Van Compernolle. Noise Adaptation in a Hidden Markov Model Speech Recognition System.

References to this book

All Book Search results »

Bibliographic information