Computer Speech: Recognition, Compression, Synthesis

Front Cover
Springer Science & Business Media, Jun 15, 2004 - Science - 379 pages
0 Reviews
The first edition having been sold out, gives me a welcome opportunity to augment this volume by some recent applications of speech research. A new chapter, by Holger Quast, treats speech dialogue systems and natural lan guage processing. Dictation programs for word processors, voice dialing for mobile phones, and dialogue systems for air travel reservations, automated banking, and translation over the telephone are at the forefront of human-machine inter faces. Spoken language dialogue systems are also invaluable for the physically handicapped. For researchers new to the field, the new chapter (pp. 67-106) provides an overview of fundamental linguistic concepts from phonetics, morphology, syntax, semantics and pragmatics, gramm ars and knowedge representation. Symbolic methodology, such as Norman Chomsky's traditional hierarchy of formal languages is layed out as are statistical approaches to analyze text. Proven tools of language processing are covered in detail, including finite state automata, Zipf's law, trees annd parsers. The second part of the new chapter introduces the building blocks of state-of-the-art dialogue systems.
 

What people are saying - Write a review

We haven't found any reviews in the usual places.

Contents

1 Introduction
1
12 Voice Coders
2
13 Voiceprints for Combat and for Fighting Crime
4
14 The Electronic Secretary
7
15 The Human Voice as a Key
8
16 Clipped Speech
9
17 Frequency Division
11
Speech in the Soviet Union
12
71 Sources and Filters
136
73 The Vocal Tract
139
731 Radiation from the Lips
140
74 The Acoustic Tube Model of the Vocal Tract
142
75 Discrete Time Description
146
8 The Speech Signal
149
81 Spectral Envelope and Fine Structure
150
84 The Formant Frequencies
151

19 Linking Fast Trains to the Telephone Network
13
110 Digital Decapitation
14
111 Man into Woman and Back
16
114 Spectral Compression for the HardofHearing
17
117 Slow Speed for Better Comprehension
19
119 Improving Public Address Systems
20
121 Conclusion
21
2 A Brief History of Speech
23
22 Wolfgang Ritter von Kempelen
24
23 From Kratzenstein to Helmholtz
26
24 Helmholtz and Rayleigh
27
Alexander Melville and Alexander Graham Bell
28
26 Modern Times
29
27 The Vocal Tract
30
28 Articulatory Dynamics
31
29 The Vocoder and Some of Its Progeny
34
210 Formant Vocoders
35
211 Correlation Vocoders
36
213 Center Clipping for Spectrum Flattening
37
214 Linear Prediction
38
216 Neural Networks
39
218 Conclusion
40
3 Speech Recognition and Speaker Identification
41
31 Speech Recognition
42
32 Dialogue Systems
44
33 Speaker Identification
45
34 Word Spotting
46
35 Pinpointing Disasters by Speaker Identification
47
36 Speaker Identification for Forensic Purposes
48
37 Dynamic Programming
49
39 Shannons Outguessing Machine A Markov Model Analyzer
50
310 Hidden Markov Models in Speech Recognition
51
3101 The model and algorithms
52
311 Neural Networks
55
3111 The Perceptron
56
3114 Kohonen SelfOrganizing Maps
57
3115 Hopfield Nets and Associative Memory
58
312 Whole Word Recognition
59
314 The Modulation Transfer Function
60
4 Speech Dialogue Systems and Natural Language Processing
67
Levels of Language Analysis and Knowledge Representation
68
412 Grammars
71
413 Symbolic Processing
73
414 Statistical Processing
77
42 Speech Dialogue Systems
86
421 Demands of a Dialogue System
87
422 Architecture and Components
89
424 Natural Language Processing
92
425 Discourse Engine
95
426 Response Generation
101
427 Speech Synthesis
103
428 Summary
105
5 Speech Compression
107
51 Vocoders
108
52 Digital Simulation
109
53 Linear Prediction
110
531 Linear Prediction and Resonances
111
532 The Innovation Sequence
115
533 Single Pulse Excitation
116
534 Multipulse Excitation
118
536 Masking of Quantizing Noise
119
537 Instantaneous Quantizing Versus Block Coding
120
538 Delays
122
539 Code Excited Linear Prediction CELP
123
5311 Efficient Coding of Parameters
124
55 Transform Coding
125
56 Audio Compression
126
6 Speech Synthesis
129
61 ModelBased Speech Synthesis
131
62 Synthesis by Concatenation
132
63 Prosody
133
7 Speech Production
135
9 Hearing
153
91 Historical Antecedents
155
92 Thomas Seebeck and Georg Simon Ohm
157
94 Hermann von Helmholtz and Georg von Bekesy
158
942 Pulsation Threshold and Continuity Effect
159
95 Anatomy and Basic Capabilities of the Ear
160
98 The Inner Ear
162
99 Mechanical to Neural Transduction
169
910 Some Astounding Monaural Phase Effects
171
911 Masking
174
913 Scaling in Psychology
175
914 Pitch Perception and Uncertainty
177
10 Binaural Hearing Listening with Both Ears
179
102 Precedence and Haas Effects
181
103 Vertical Localization
183
104 Virtual Sound Sources and QuasiStereophony
185
105 Binaural Release from Masking
188
106 Binaural Beats and Pitch
189
107 Direction and Pitch Confused
190
108 PseudoStereophony
194
109 Virtual Sound Images
196
1010 Philharmonic Hall New York
197
1011 The Proper Reproduction of Spatial Sound Fields
198
1012 The Importance of Lateral Sound
200
1013 How to Increase Lateral Sounds in Real Halls
202
1014 Summary
205
11 Basic Signal Concepts
207
112 Fourier Transforms
208
113 The Autocorrelation Function
211
114 The Convolution Integral and the Delta Function
213
115 The CrossCorrelation Function and the CrossSpectrum
215
1151 A Bit of Number Theory
217
116 The Hilbert Transform and the Analytic Signal
218
117 Hilbert Envelope and Instantaneous Frequency
220
118 Causality and the KramersKronig Relations
224
1181 Anticausal Functions
225
1182 MinimumPhase Systems and Complex Frequencies
226
1183 Allpass Systems
227
1184 Dereverberation
228
119 Matched Filtering
229
1110 Phase and Group Delay
230
1111 Heisenberg Uncertainty and the Fourier Transform
232
11111 Prolate Spheroidal Wave Functions and Uncertainty
234
1112 Time and Frequency Windows
238
1113 The WignerVille Distribution
239
Measurement of Fundamental Frequency
241
1115 Line Spectral Frequencies
244
A Acoustic Theory and Modeling of the Vocal Tract
247
A2 Acoustics of a Hard Walled Lossless Tube
248
A22 TimeInvariant Case
252
A23 Formants as Eigenvalues
253
A24 Losses and Nonrigid Walls
255
A3 Discrete Modeling of a Tube
257
A32 FrequencyDomain Modeling TwoPort Theory
260
A33 Tube Models and Linear Prediction
263
A4 Notes on the Inverse Problem
265
A42 Empirical Methods
268
B Direct Relations Between Cepstrum and Predictor Coefficients
269
B2 Direct Computation of Predictor Coefficients from the Cepstrum
271
B3 A Simple Check
272
B5 Connection with Statistical Moments and Cumulants
274
B7 An Application of RootPower Sums to Pitch Detection
275
References
279
General Reading
297
Selected Journals and Research Reports
307
A Sampling of Societies and Major Meetings
308
Glossary of Speech and Computer Terms
309
Name Index
339
Subject Index
349
The Author
377
Copyright

Other editions - View all

Common terms and phrases

References to this book

All Book Search results »

About the author (2004)

Manfred Schroeder is a pioneer in the artistic potential of computer graphics, a world-renowned expert in concert hall acoustics, and holder of over 45 patents. He divides his time between Berkeley Heights, California and Goetingen, Germany.

Bibliographic information