Speech recognition final presentation

• What is speech
recognition?

 Speech recognition technology has recently
reached a higher level of performance and
robustness, allowing it to communicate to another
user by talking .
 Speech Recognization is process of decoding
acoustic speech signal captured by microphone or
telephone ,to a set of words.
 And with the help of these it will recognize whole
speech is recognized word by word .

 : speaker independent and speaker dependent.
 Speaker independent models recognize the speech patterns of a
large group of people.
 Speaker dependent models recognize speech patterns from only
one person. Both models use mathematical and statistical
formulas to yield the best work match for speech. A third
variation of speaker models is now emerging, called speaker
adaptive.
 Speaker adaptive systems usually begin with a speaker
independent model and adjust these models more closely to
each individual during a brief training period.

• Most Natural Form Of
Communication
• Differently abled people
• Illiterate
• Helplines
• Cars

Voice Input Analog to Digital Acoustic Model
Language Model
Feedback Display Speech Engine

 Step 1:User Input
The system catches user’s voice in the form of
analog acoustic signal.
 Step 2:Digitization
Digitize the analog acoustic signal.
 Step 3:Phonetic Breakdown
Breaking signals into phonemes.

 Step 4:Statistical Modeling
 Mapping phonemes to their phonetic
representation using statistics model.
 Step 5:Matching
 According to grammar , phonetic representation
and Dictionary , the system returns an n-best list
(I.e.:a word plus a confidence score)
 Grammar-the union words or phrases to constraint
the range of input or output in the voice application.
 Dictionary-the mapping table of phonetic
representation and word(EX:thu,theethe)

13
/3
4
Approaches
to ASR
Template
based
Statistics
based

Store examples of units (words,
phonemes), then find the example that
most closely fits the input
Extract features from speech signal, then
it’s “just” a complex similarity matching
problem, using solutions developed for all
sorts of applications
OK for discrete utterances, and a single
user
14
/3
4

Hard to distinguish very similar templates
And quickly degrades when input differs
from templates
Therefore needs techniques to mitigate
this degradation:
• More subtle matching techniques
• Multiple templates which are aggregated
 Taken together, these suggested …
15
/3
4

Collect a large corpus of transcribed
speech recordings
Train the computer to learn the
correspondences (“machine learning”)
At run time, apply statistical processes to
search through the space of all possible
solutions, and pick the statistically most
likely one
16
/3
4

Acoustic and Lexical Models
• Analyse training data in terms of relevant features
• Learn from large amount of data different
possibilities
 different phone sequences for a given word
 different combinations of elements of the speech signal
for a given phone/phoneme
• Combine these into a Hidden Markov Model
expressing the probabilities
17
/3
4

 Real-world has structures and processes which have (or
produce) observable outputs:
o Usually sequential (process unfolds over time)
o Cannot see the event producing the output
Example: speech signals

HMM Overview
• Machine learning method
• Makes use of state machines
• Based on probabilistic model
• Can only observe output from states,
not the states themselves
– Example: speech recognition
• Observe: acoustic signals
• Hidden States: phonemes
(distinctive sounds of a language)

HMM Components
• A set of states (x’s)
• A set of possible output symbols
(y’s)
• A state transition matrix (a’s):
probability of making transition from
one state to the next
• Output emission matrix (b’s):
probability of a emitting/observing a
symbol at a particular state
• Initial probability vector:
o probability of starting at a
particular state
o Not shown, sometimes assumed
to be 1

HMM Advantages
• Advantages:
o Effective
o Can handle variations in record structure
Optional fields
Varying field ordering

 Digitization
• Converting analogue signal into digital representation.
 Signal processing
• Separating speech from background noise.
 Phonetics
• Variability in human speech.
 Phonology
• Recognizing individual sound distinctions (similar phonemes.)
 Lexicology and syntax
• Disambiguating homophones.
• Features of continuous speech.
 Syntax and pragmatics
• Interpreting features.
• Filtering of performance errors (disfluencies).

Speech Recognition is still a very cumbersome problem.
Following are the problem….
 Speaker Variability
Two speakers or even the same speaker will
pronounce the same word differently
 Channel Variability
The quality and position of microphone and
background environment will affect the output

 Speech recognition applications include
 Voice dialling (e.g., "Call home"),
 Call routing (e.g., "I would like to make a collect call"),
 Simple data entry (e.g., entering a credit card number),
 Preparation of structured documents (e.g., A radiology
report),
 Speech-to-text processing (e.g., word processors or emails),
and
 In aircraft cockpits (usually termed Direct Voice Input).

 Medical Transcription
 Military
 Telephony and other domains
 Serving the disabled
Further Applications
• Home automation
• Automobile audio systems
• Telematics

 Faster than “hand-writing”.
 Allows for better spelling, whether it be in
text or documents.
 Helpful for people with a mental or
physical disability .
 Hands-free capability .

 No program is 100% perfect
 Factors that affect the accuracy of speech
recognition are: slang, homonyms, signal-to-
noise ratio, and overlapping speech
 Can be expensive depending on the
program

 http://en.wikipedia.org/wiki/Speech_recognition
 https://www.scribd.com/doc/130376790/Speech-
Recognition
 "Speaker Independent Connected Speech Recognition- Fifth
Generation Computer Corporation". Fifthgen.com.
 http://books.google.co.in/books?hl=en&lr=&id=iDHgboYR
zmgC&oi=fnd&pg=PA1&dq=speech+recognition+papers+
publications&ots=jb6NESTrjF&sig=oMKROIXccSgEyMGO
Zmi5lkToJvM#v=onepage&q=speech%20recognition%20p
apers%20publications&f=false
 http://www.speechrecognition.com
 https://www.google.co.in/?gfe_rd=cr&ei=GbHdU9f1MtKAo
AOW64GADg&gws_rd=ssl

Speech recognition final presentation

Speech recognition final presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Speech recognition final presentation

Similar to Speech recognition final presentation (20)

Recently uploaded

Recently uploaded (20)

Speech recognition final presentation