Search Images Videos Maps News Shopping Gmail More »
My library | Help | Advanced Book Search | Web History | Sign in

Books

Automatic Speech Recognition On Mobile Devices And Over Communication Networks

 (Google eBook)
Front Cover
0 Reviews
Springer, Mar 3, 2008 - 401 pages
The remarkable advances in computing and networking have sparked an enormous interest in deploying automatic speech recognition on mobile devices and over communication networks. This trend is accelerating. This book brings together leading academic researchers and industrial practitioners to address the issues in this emerging realm and presents the reader with a comprehensive introduction to the subject of speech recognition in devices and networks. It covers network, distributed and embedded speech recognition systems, which are expected to co-exist in the future. It offers a wide-ranging, unified approach to the topic and its latest development, also covering the most up-to-date standards and several off-the-shelf systems. Key features: a? Provides an in-depth review of network speech recognition, distributed speech recognition, embedded speech recognition, systems and applications a? Begins with a comprehensive overview of the subject, discussing the pros and cons of the presented approaches, and guiding the reader through the following chapters a? Includes platforms like mobile phones, PDAs and automobiles a? Presents state-of-the-art methods, advanced systems, and the latest standards a? Offers working knowledge needed for both research and practice a? References supplemental material at associated complementary website at: http://asr.es.aau.dk This all-inclusive text/reference is an essential read for graduate students, scientists and engineers working or researching in the field of speech recognition and processing. It offers a self-contained approach to this hot research topic.
  

What people are saying - Write a review

We haven't found any reviews in the usual places.

Related books

Contents

143 Channel Coding and Packetisation
13
144 Error Concealment
14
146 A Configurable DSR System
15
151 ESR Scenario
16
153 FixedPoint Arithmetic
17
154 Optimisation
18
155 Robustness
19
16 Discussion
20
References
21
Speech Coding and Packet Loss Effects on Speech and Speaker Recognition
25
22 Sources of Degradation in Network Speech Recognition
28
222 Packet Loss
30
23 Effects on the Automatic Speech Recognition Task
32
233 Degradation with Real Transmissions
33
234 Degradation Due to Speech and Audio Codecs
34
24 Effect for the Automatic Speaker Verification Task
35
241 Speaker Verification Experiments Over Compressed Speech
36
242 Speaker Verification Experiments Over GSM Compressed Speech
37
25 Conclusion
38
References
39
Speech Recognition Over Mobile Networks
41
32 Techniques for Improving ASR Performance Over Mobile
43
33 BitstreamBased Approach
46
34 Feature Transform
50
341 MelScaled LPCC
51
342 LPCBased MFCC LPMFCC
52
343 PseudoCepstrum PCEP and Its MelScaled Variant MPCEP
53
352 Compensation for Speech Coding Distortion in LSP Domain
54
353 Compensation for Channel Errors
56
36 Conclusion
57
References
58
Speech Recognition Over IP Networks
62
42 Speech Recognition and IP Networks
65
422 Impact of Speech Coding Distortion
66
423 Impact of Network Channel Distortion
67
43 Robustness Against Packet Loss
69
432 Forward Error Correction
70
434 Error Concealment and ASR DecoderBased Concealment
71
441 MFCCBased Speech Coder
72
442 Efficient Vector Quantization of MFCCs
74
443 Speech Quality Comparison
78
444 ASR Performance Comparison
79
45 Conclusion
82
Distributed Speech Recognition
85
Distributed Speech Recognition Standards
87
52 Overview of the Set of DSR Standards
89
53 Scope of the Standards
90
531 ElectroAcoustics
91
532 Speech Detection or External Control Signal
92
535 Compression and Error Protection
93
54 DSR Basic FrontEnd ES 201 108
94
543 Error Detection and Mitigation
95
55 DSR Advanced FrontEnd ES 202 050
96
56 Recognition Performance of the DSR FrontEnds
97
57 3GPP Evaluations and Comparisons to AMR Coded Speech
99
58 ETSI DSR Extended FrontEnd Standards ES 202 211 and ES 202 212
102
The IETF RTP Payload Formats for DSR
104
510 Conclusion
105
Speech Feature Extraction and Reconstruction
107
62 Feature Extraction
109
622 Advanced TerminalSide Feature Extraction
115
623 Quantisation and Packetisation
116
624 ServerSide Processing
117
631 Analysis of Received Speech Information
118
632 Speech Reconstruction
119
64 Prediction of Voicing and Fundamental Frequency
123
642 Voicing Prediction from MFCC Vectors
126
643 Speech Reconstruction from Predicted Fundamental Frequency and Voicing
128
65 Conclusion
129
Quantization of Speech Features Source Coding
131
72 Quantization Schemes
132
722 Distortion Measures for Quantization in Speech Processing
134
723 Scalar Quantization
135
724 Block Quantization
137
726 GMMBased Block Quantization
138
73 Quantization of ASR Feature Vectors
141
732 Statistical Properties of MFCCs
142
733 Use of Cepstral Liftering for MFCC Variance Normalization
148
734 Relationship Between the Distortion Measure and Recognition Performance
150
Perceptual Weighting of Filterbank Energies
152
74 Experimental Results
153
742 Experimental Setup
154
744 Unconstrained Vector Quantization
155
745 GMMBased Block Quantization
156
747 PerceptuallyWeighted Vector Quantization of Logarithmic Filterbank Energies
157
75 Conclusion
158
References
159
Error Recovery Channel Coding and Packetization
162
82 Characterization and Modeling of Communication Channels
164
822 Signal Degradation Over IP Networks
165
83 MediaSpecific FEC
167
84 MediaIndependent FEC
168
841 Combining FEC with Error Concealment Methods
169
843 Cyclic Codes
174
85 Unequal Error Protection
176
86 Frame Interleaving
177
861 Optimal Spread Block Interleavers
178
862 Convolutional Interleavers
179
863 Decorrelated Block Interleavers
180
87 Examples of Modern Error Recovery Standards
181
872 ETSI GSMEFR Standard ETSI 1998
182
88 Summary
183
Acknowledgments
184
Error Concealment
187
92 Speech Recognition in the Presence of Corrupted Features
190
922 Gaussian Approximation
193
93 Feature Posterior Estimation in a DSR Framework
194
931 ETSI DSR Standards
195
933 Channel Models
196
934 Estimation of Feature Posterior
199
935 Related Work
201
94 Performance Evaluations
202
104 Front End
216
105 Observation Model
217
1052 Efficient Computation Strategies
218
106 Search
221
1061 Viterbi Search Implementation
222
1062 Search Graph Construction
226
1063 Fast Match
228
107 Conclusion
229
References
230
Algorithm Optimizations Low Memory Footprint
233
112 Notations and Problem Statement
234
113 Model Complexity Control
237
1131 Akaikes Information Criterion
238
1133 Second Order Approximation
239
1141 Model Level
240
1142 State Level
241
1145 Clustering
242
115 Parameter Representations
243
1152 Fixed Point Representation
244
116 Quantized Parameters HMMs
245
1162 Vector Quantization
247
1171 Subspace Partitioning
248
1172 Density Clustering
249
119 Practicalities and Conclusion
250
References
251
FixedPoint Arithmetic
255
122 FixedPoint Arithmetic
257
1222 FixedPoint Representation and Quantization
259
1231 HMM State Likelihoods
261
1232 State Duration Model
262
1233 Language Model
263
1235 Acoustic FrontEnd
264
1241 LogLikelihoods
265
1242 Viterbi FrameSynchronous Search
266
1243 Gaussian Parameters
267
1244 MFCC FrontEnd
268
125 Experiments
269
1251 RealTime on the Device
272
126 Conclusion
274
Systems and Applications
276
Software Architectures for Networked Mobile Speech Applications
279
1312 The Voice Web
280
1313 Multimodal User Interfaces
283
1314 Distributed Speech Recognition
284
1315 Multimodal Architectures
285
1316 Simultaneous and Sequential Multimodality
287
1317 Mode Composition
288
1321 Fully Embedded or Fat Client a
289
1323 Thin Client d
291
1325 Pudgy Client c
292
133 The Plus V Distributed Multimodal Architecture
293
134 Other Distributed Multimodal Architectures
295
1343 Bare Minimum Mobile Voice Search
296
1344 A TranscriptionBased Architecture
297
136 Conclusion
298
Speech Recognition in Mobile Phones
300
142 Applications of Speech Recognition for Mobile Phones
302
143 Multilinguality and Language Support
305
1432 Multilinguality in Other ASR Applications
308
144 Noise Robustness
309
1443 Noise Reduction
310
145 Footprint and Complexity Reduction
314
1452 Footprint Reduction of Language Models
315
1453 Footprint Reduction of Pronunciation Lexicon
317
1455 Low Memory Fast Decoding
319
Large Vocabulary Isolated Word Dictation
320
147 Conclusion and Outlook
323
Handheld Speech to Speech Translation System
327
152 System Overview
328
1522 Hardware and OS Specifications
330
153 System Components and Optimization
332
1532 Natural Language Understanding and Generation Based Translation
334
1533 Weighted Finite State Transducer Based Translation
337
1534 Embedded Speech Synthesis
340
154 Experiments and Discussions
341
1542 Translation Experiments
343
155 Conclusion
344
References
345
Automotive Speech Recognition
347
162 Siemens Speech ProcessingFrom Research to Products
348
1622 HighPerformance Recognizer
349
1623 UltraCompact TexttoSpeech Synthesizer
350
1624 Natural Voice Dialog
351
1631 Radio Station Selection
352
1633 Navigation Destination Entry
353
1634 Manuals and Help Systems
354
1635 Access to Structured Web Content
355
1636 Access to Web Services
356
164 Automotive Platform Issues and Challenges
357
1641 Hardware Constraints
358
1642 Software Constraints
359
1643 User Constraints
360
1651 ASR FrontEnd
362
1652 Minimum Mean Square Weighting Rules
363
1653 Recursive Least Squares Weighting Rules
364
1654 Implementations of RLS Weighting Rules
365
1655 Recognition Results
366
166 Methodology for Evaluation of Automotive Recognizers Quality Measurement Using SNR Curves
367
1661 Common Evaluation Procedures
368
1664 Evaluation
369
1665 Best Practice
371
167 Conclusion
372
Energy Aware Speech Recognition for Mobile Devices
374
1712 Energy Aware Design Principles
376
1713 Related Work
377
172 Case Study of Distributed Speech Recognition Using the HP Labs Smartbadge System
379
1722 Energy Consumption of DSR with IEEE 80211 Wireless Networks
384
1723 Energy Consumption of DSR Using Bluetooth Networks
389
1724 Comparison of 80211 and Bluetooth in DSR
391
173 Conclusion
395
Index
397
Copyright

Other editions - View all

Common terms and phrases

Popular passages

Page 83 - A Survey of Packet Loss Recovery Techniques for Streaming Audio,
Page 159 - Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences", IEEE Trans.
Page 185 - Convolutional Codes and Their Performance in Communication Systems," IEEE Transactions on Communications, vol.
Page 22 - Enabling new speech-driven services for mobile devices: An overview of the ETSI standards activities for distributed speech recognition front-ends.
Page 21 - The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions", in Proceedings of ISCA ITRW ASR 2000.
Page 161 - Matrix quantizer design for LPC speech using the generalized Lloyd algorithm," IEEE Transactions on Acoustics Speech and Signal Processing, vol.
Page 60 - NR Sollenberger, N. Seshadri, and R. Cox, "The Evolution of IS-136 TDMA for Third-Generation Wireless Services," IEEE Personal Communications, Vol.
Page 10 - Its major goal is to bring the advantages of web-based development and content delivery to interactive voice response applications.
Page 160 - Rabiner, L., and Juang, BH. (1993). Fundamentals of Speech Recognition. Prentice Hall PTR, Englewood Cliffs, New Jersey.
Page 132 - The balance of this chapter is divided into four sections. In the first section, we discuss how research results are affected by procedural choices with regard to each operational feature.

References to this book

From Google Scholar

Speech Recognition for Smart Homes
Ian McLoughlin, Hamid Reza Sharifzadeh