This is a ppt on speech recognition system or automated speech recognition system. I hope that it would be helpful for all the people searching for a presentation on this technology
5. Speech recognition technology has recently
reached a higher level of performance and
robustness, allowing it to communicate to another
user by talking .
Speech Recognization is process of decoding
acoustic speech signal captured by microphone or
telephone ,to a set of words.
And with the help of these it will recognize whole
speech is recognized word by word .
6. : speaker independent and speaker dependent.
Speaker independent models recognize the speech patterns of a
large group of people.
Speaker dependent models recognize speech patterns from only
one person. Both models use mathematical and statistical
formulas to yield the best work match for speech. A third
variation of speaker models is now emerging, called speaker
adaptive.
Speaker adaptive systems usually begin with a speaker
independent model and adjust these models more closely to
each individual during a brief training period.
7. • Most Natural Form Of
Communication
• Differently abled people
• Illiterate
• Helplines
• Cars
8.
9.
10. Voice Input Analog to Digital Acoustic Model
Language Model
Feedback Display Speech Engine
11. Step 1:User Input
The system catches user’s voice in the form of
analog acoustic signal.
Step 2:Digitization
Digitize the analog acoustic signal.
Step 3:Phonetic Breakdown
Breaking signals into phonemes.
12. Step 4:Statistical Modeling
Mapping phonemes to their phonetic
representation using statistics model.
Step 5:Matching
According to grammar , phonetic representation
and Dictionary , the system returns an n-best list
(I.e.:a word plus a confidence score)
Grammar-the union words or phrases to constraint
the range of input or output in the voice application.
Dictionary-the mapping table of phonetic
representation and word(EX:thu,theethe)
13. 13
/3
4
Approaches
to ASR
Template
based
Statistics
based
14. Store examples of units (words,
phonemes), then find the example that
most closely fits the input
Extract features from speech signal, then
it’s “just” a complex similarity matching
problem, using solutions developed for all
sorts of applications
OK for discrete utterances, and a single
user
14
/3
4
15. Hard to distinguish very similar templates
And quickly degrades when input differs
from templates
Therefore needs techniques to mitigate
this degradation:
• More subtle matching techniques
• Multiple templates which are aggregated
Taken together, these suggested …
15
/3
4
16. Collect a large corpus of transcribed
speech recordings
Train the computer to learn the
correspondences (“machine learning”)
At run time, apply statistical processes to
search through the space of all possible
solutions, and pick the statistically most
likely one
16
/3
4
17. Acoustic and Lexical Models
• Analyse training data in terms of relevant features
• Learn from large amount of data different
possibilities
different phone sequences for a given word
different combinations of elements of the speech signal
for a given phone/phoneme
• Combine these into a Hidden Markov Model
expressing the probabilities
17
/3
4
18. Real-world has structures and processes which have (or
produce) observable outputs:
o Usually sequential (process unfolds over time)
o Cannot see the event producing the output
Example: speech signals
19. HMM Overview
• Machine learning method
• Makes use of state machines
• Based on probabilistic model
• Can only observe output from states,
not the states themselves
– Example: speech recognition
• Observe: acoustic signals
• Hidden States: phonemes
(distinctive sounds of a language)
20. HMM Components
• A set of states (x’s)
• A set of possible output symbols
(y’s)
• A state transition matrix (a’s):
probability of making transition from
one state to the next
• Output emission matrix (b’s):
probability of a emitting/observing a
symbol at a particular state
• Initial probability vector:
o probability of starting at a
particular state
o Not shown, sometimes assumed
to be 1
22. HMM Advantages
• Advantages:
o Effective
o Can handle variations in record structure
Optional fields
Varying field ordering
23. Digitization
• Converting analogue signal into digital representation.
Signal processing
• Separating speech from background noise.
Phonetics
• Variability in human speech.
Phonology
• Recognizing individual sound distinctions (similar phonemes.)
Lexicology and syntax
• Disambiguating homophones.
• Features of continuous speech.
Syntax and pragmatics
• Interpreting features.
• Filtering of performance errors (disfluencies).
24. Speech Recognition is still a very cumbersome problem.
Following are the problem….
Speaker Variability
Two speakers or even the same speaker will
pronounce the same word differently
Channel Variability
The quality and position of microphone and
background environment will affect the output
25. Speech recognition applications include
Voice dialling (e.g., "Call home"),
Call routing (e.g., "I would like to make a collect call"),
Simple data entry (e.g., entering a credit card number),
Preparation of structured documents (e.g., A radiology
report),
Speech-to-text processing (e.g., word processors or emails),
and
In aircraft cockpits (usually termed Direct Voice Input).
26. Medical Transcription
Military
Telephony and other domains
Serving the disabled
Further Applications
• Home automation
• Automobile audio systems
• Telematics
27. Faster than “hand-writing”.
Allows for better spelling, whether it be in
text or documents.
Helpful for people with a mental or
physical disability .
Hands-free capability .
28. No program is 100% perfect
Factors that affect the accuracy of speech
recognition are: slang, homonyms, signal-to-
noise ratio, and overlapping speech
Can be expensive depending on the
program