SlideShare a Scribd company logo
1 of 58
Conversational Speech Translation
Challenges and Techniques
Chris.Wendt@Microsoft.com
@Tian500
with Will LewisTAUS Forum Tokyo – April 26, 2016
V160418
Why now?
Confluence of factors:
Steady progress in MT quality over the last few years
• Using huge amounts of data
Technological Leap in ASR
• Deep Learning (DNNs) – 33+% WER reduction over GMMs (Seide et al 2011)
 From average of 30% down to 20%, in English
 Now above 42%
• More robust to noise, speaker variation, accents
Skype
• A global platform to put speech translation in the hands of 100s of Millions of
users
Skype Translator: Goal
•To support open-domain conversations between Skype users in
different parts of the world, speaking different languages
•
Automatic
Speech
Recognition
(ASR)
Microsoft
Translator Skype
Infrastructure
New Skype
Translator
Client app
Skype
Translator
Skype Translator: What is it?
•
Skype Translator: What is it?
•Current state of the art in Speech Recognition and Machine Translation
embedded in a VoIP client: Skype
Skype Translator = Universal Translator?
Skype Translator = Universal Translator?
Automatic
Speech
Recognition
(ASR)
Microsoft
Translator Skype
Infrastructure
New Skype
Translator
Client app
Skype
Translator
Microsoft Speech to Speech (S2S): What is it?
•
1. High quality speech recognition
•
The Challenges
The Challenges
• The gulf between speech and text
• It’s not enough to just chain a really good ASR system with a really good MT system
• How people talk to each other is not how they write
• Building really good conversational ASR and MT systems
• Significant changes in the data we use to train the ASR and MT systems.
• The gap between technology demo and consumer product
• Producing models with shippable latency
• Interesting problems one encounters with real consumers
How people really speak
What person thought they said:
Yeah. I guess it was worth it.
 Ja. Ich denke, es hat sich gelohnt.
 はい。私はそれの価値があったと思います。
What they actually said:
Yeah, but um, but it was you know, it was, I guess, it was worth it.
 Ja, aber ähm, aber es war, weißt du, es war, ich denke, es hat sich gelohnt.
 はい、ええと、あなたが知っている、だったが、推測すると、それはそれの価値があった
けど。
Disfluency removal
More than just removing “um” and “ah”
Disfluencies in Conversational Speech
um no i mean yes but you know i am i've
never done it myself have you done that uh yes
Disfluency types:
• Pause Fillers
• Discourse Markers
• Repetition
• Corrections (“speech repairs”)
um no i mean yes but you know i am i've
never done it myself have you done that uh yes
Yes.
But, I’ve never done it myself.
Have you done that?
Yes?
Disfluencies in Conversational Speech
um no i mean yes but you know i am i've
never done it myself have you done that uh yes
um no i mean yes but you know i am i've
never done it myself have you done that uh yes
Yes.
But, I’ve never done it myself.
Have you done that?
Yes?
Need to:
1. Segment
2. Remove disfluencies
3. Punctuate
4. Add case
Without TrueText
•um no i mean yes but you know i am i've never done it myself
have you done that uh yes
•um i いという意味ないが、私は知っている私はそれをやったことが
ない自分をした、ええとはい
Translate
With TrueText
Translate
Yes.
But, I’ve never done it myself.
Have you done that?
Yes?
はい。
しかし、私はそれを自分自身を
行ってきたこと。
行っているか。
はい?
Missing punc  Catastrophic Effects
Questions
¿vas ahora?  are you going now?
vas ahora  go now
Negation
no es mi segundo  it is not my second
no. es mi segundo  no. it’s my second
Seriously embarrassing
tienes una hija ¿no? es muy preciosa  you have a daughter right? is very beautiful
tienes una hija no es muy preciosa  you have a daughter is not very beautiful
Accents/Wrong chars  Changes in meaning
Accented words (sound-alikes)
• Written with different forms  different meanings
• But pronounced the same
Si los vinos mendocinos son muy famosos
If the wines from Mendoza are very famous
Sí los vinos mendocinos son muy famosos
Yes the wines from Mendoza are very famous
Misrecognized words/characters (sound-alikes)
你 经常在没有听完的时候就睡着了吗 Do you often fall asleep without listening to it?
你 经常在没有听完的时候就睡着了嘛 You often fall asleep without listening to it.
How people say things
Here’s what we need to recognize and translate
• He ain't my choice. But, hey, we hated the last guy.
• We're going to hit it and quit it.
• Boy, that story gets better every time you hear it.
• I swear to God I am done with guys like that.
Unfortunately a lot of our MT training data looks like this
• Mr President, Commissioner, Mr Sacconi, ladies and gentlemen, as the PPE-DE's coordinator for
regional policy, I want to stress that some very important points are made in this resolution.
• I am therefore calling for integrated policies, all-encompassing policies that we can adapt to society,
which must listen to our recommendations and comply with them.
Data mismatch & scarcity
Training data mismatch
• MT training is clearly mismatched
• ASR training data is a mixed bag
Data scarcity
• Traditional data sources (govt, news, web) not well matched
• Not a lot of parallel conversational data (for MT)
• Not a lot of transcribed conversational data (for ASR)
ASR: word errors, missing vocab
ASR vocab issues – e.g. names
Hi Arul  Hi Aaron
I went skiing at Snoqualmie pass  I went skiing at snow call me pass
ASR errors
How do we minimize the impact of misrecognized words?
The Solutions
Speech Recognition
The Challenges
• Conversational speaking style
• Open domain
Key enabler: dramatic ASR improvements from using Deep Neural
Networks
Where to get training data?
• US English: DARPA Switchboard (2000h) is a great start; but no comparable corpus for other
languages
• Use “found” captioned speech.
 Many thousands of hours of speech used for English system
Training Data: Audio w/ Fluent Transcripts
• Disfluent (what we want): Well I uh started this this project while
I was a student uh grad student at uh Stan- Stanford
• Fluent (what we get): I started this project while I was a grad
student at Stanford
Recreate disfluent training material
The Solutions
Machine Translation > ASR output
Adapting MT to Conversational Domain
ASR/MT Mismatch
Significant data mismatch between ASR output (even
when cleaned) and MT:
• He ain't my choice. But, hey, we hated the last guy.
• We're going to hit it and quit it.
vs.
• Mr President, Commissioner, Mr Sacconi, ladies and gentlemen, as the PPE-DE's coordinator for
regional policy, I want to stress that some very important points are made in this resolution.
But where do we get parallel conversational data?
Example: Movie subtitles
Data Selection
• Sample “in-domain” (“in-register”?) data from our en-fr parallel
data store
• Leverage the fact that the data pool does not match the target domain
• Use monolingual conversational data as seed (“in-domain”): CallHome, SWBD
• Use Cross-Entropy Difference method (Moore-Lewis 2010) against very large
parallel corpus (for ENU-FRA, many 100s of Ms sentences)
• Train on combination of subtitle and DA data
The solution
TrueText (ASR > MT)
Bridging the gap between ASR and MT
Oui.
Mais je ne l'ai jamais fait moi-
même.
Avez-vous utilisé le vôtre avant ?
Gurdeep va demander de l'aide.
Speech
Recognition
Speech
Correction
Translation
Text
to
Speech
Raw ASR
Output
um no i mean yes but i am i've never done it myself did users
before uh I will ask go deep to help me
Euh non je veux dire oui
mais je suis je l'ai jamais fait
moi-même fait util-isateurs
avant euh je vais demander
à aller pro-fond pour m'aider
Customization
and Personalization
um no i mean yes but i am i've never done it myself did users
before uh I will ask gurdeep to help me
Segmentation
Punctuation
and True Casing
Yes.
But I’ve never done it myself.
Did you use yours before?
I will ask Gurdeep to help me.
Disfluency Removal
no i mean yes but I am i've never done it myself did you use
yours before uh I will ask gurdeep to help me
Lattice
Rescoring
um no i mean yes but i am i've never done it myself did you
use yours before uh I will ask gurdeep to help me
Oui.
Mais je ne l'ai jamais fait moi-
même.
Avez-vous utilisé le vôtre avant ?
Gurdeep va demander de l'aide.
Raw ASR
Output
um no i mean yes but i am i've never done it myself did users
before uh I will ask go deep to help me
Customization
and Personalization
um no i mean yes but i am i've never done it myself did users
before uh I will ask gurdeep to help me
Segmentation
Punctuation
and True Casing
Yes.
But I’ve never done it myself.
Did you use yours before?
I will ask Gurdeep to help me.
Disfluency Removal
no i mean yes but I am i've never done it myself did you use
yours before uh I will ask gurdeep to help me
Lattice
Rescoring
um no i mean yes but i am i've never done it myself did you
use yours before uh I will ask gurdeep to help me
Speech
Recognition
Speech
Correction
Translation
Text
to
Speech
um 意味ないはい、私です
私はそれをやったことがな
い自分はユーザー、ええと
私は手伝って深く要求され
ます前に
Oui.
Mais je ne l'ai jamais fait moi-
même.
Avez-vous utilisé le vôtre avant ?
Gurdeep va demander de l'aide.
Raw ASR
Output
um no i mean yes but i am i've never done it myself did users
before uh I will ask go deep to help me
Customization
and Personalization
um no i mean yes but i am i've never done it myself did users
before uh I will ask gurdeep to help me
Segmentation
Punctuation
and True Casing
Yes.
But I’ve never done it myself.
Did you use yours before?
I will ask Gurdeep to help me.
Disfluency Removal
no i mean yes but I am i've never done it myself did you use
yours before uh I will ask gurdeep to help me
Lattice
Rescoring
um no i mean yes but i am i've never done it myself did you
use yours before uh I will ask gurdeep to help me
Speech
Recognition
Speech
Correction
Translation
Text
to
Speech
um 意味ないはい、私です
私はそれをやったことがな
い自分はユーザー、ええと
私は手伝って深く要求され
ます前に
Personalization and Customization
Client
Skype Translator
Service
User Profiles
Object Stores
Speech
Recognition
Customized
Language
Models
Cloud Storage
Customized
Models
Machine
Translation
CLM
Personalized Names Handling
• Name recognition is a well known problem in large vocabulary ASR
• Supporting high-recall names recognition usually compromises WER.
• We deploy high-precision approach to support contacts names recognition using personalized names lists
• Personalized names can be recognized in any context
• Examples:
• Hello Ignacio, how are you doing today?
• I will meet Arul Menezes for lunch tomorrow.
Client Client
Speech
Recognition
with Generic LM
Customized
LM
Contact names
Oui.
Mais je ne l'ai jamais fait moi-
même.
Avez-vous utilisé le vôtre avant ?
Gurdeep va demander de l'aide.
Raw ASR
Output
um no i mean yes but i am i've never done it myself did users
before uh I will ask go deep to help me
Customization
and Personalization
um no i mean yes but i am i've never done it myself did users
before uh I will ask gurdeep to help me
Segmentation
Punctuation
and True Casing
Yes.
But I’ve never done it myself.
Did you use yours before?
I will ask Gurdeep to help me.
Disfluency Removal
no i mean yes but I am i've never done it myself did you use
yours before uh I will ask gurdeep to help me
Lattice
Rescoring
um no i mean yes but i am i've never done it myself did you
use yours before uh I will ask gurdeep to help me
Speech
Recognition
Speech
Correction
Translation
Text
to
Speech
um 意味ないはい、私です
私はそれをやったことがな
い自分はユーザー、ええと
私は手伝って深く要求され
ます前に
Some early experiments: The error cascade
Speech
Recognition
Translation
Engine
1-best ASR
Proposed solutions
• Feed n-best list of ASR output to MT
• Use speech lattice directly as input to MT (e.g. Matusov et al., 2005, Lavie et al. 2004, Dyer et al, 2008)
• Confusion network decoding (e.g. Bertoldi et al., 2007; Bertoldi and Federico, 2005)
Lattice rescoring
Rescoring ASR lattice with
• Much bigger LM (100x larger than first pass)
• MT-specific features
• Tuning weights
• WER reduction 1-2% absolute
• BLEU improvement 1-2% absolute
Cherry picked examples
Ref: what do you use yours for mostly
ASR: do users for mostly
Rescored: do you use yours for mostly
Ref: but we're in a subdivision
ASR: but where in a subdivision
Rescored: but we're in a subdivision
Oui.
Mais je ne l'ai jamais fait moi-
même.
Avez-vous utilisé le vôtre avant ?
Gurdeep va demander de l'aide.
Disfluency Removal
no i mean yes but I am i've never done it myself did you use
yours before uh I will ask gurdeep to help me
Speech
Recognition
Speech
Correction
Translation
Text
to
Speech
Raw ASR
Output
um no i mean yes but i am i've never done it myself did users
before uh I will ask go deep to help me
Customization
and Personalization
um no i mean yes but i am i've never done it myself did users
before uh I will ask gurdeep to help me
Lattice
Rescoring
um no i mean yes but i am i've never done it myself did you
use yours before uh I will ask gurdeep to help me
um 意味ないはい、私です
私はそれをやったことがな
い自分はユーザー、ええと
私は手伝って深く要求され
ます前に
はい。
しかし、私はそれを自分自身を行ってきたこと。
あなたは前にあなたを使用しましたか。
私は私を助けるための Gurdeep が要求されま
す。
um 意味ないはい、私です
私はそれをやったことがな
い自分はユーザー、ええと
私は手伝って深く要求され
ます前に
Segmentation
Punctuation
and True Casing
Yes.
But I’ve never done it myself.
Did you use yours before?
I will ask Gurdeep to help me.
Speech
Recognition
Speech
Correction
Translation
Text
to
Speech
Disfluency Removal
no i mean yes but I am i've never done it myself did you use
yours before uh I will ask gurdeep to help me
Raw ASR
Output
um no i mean yes but i am i've never done it myself did users
before uh I will ask go deep to help me
Customization
and Personalization
um no i mean yes but i am i've never done it myself did users
before uh I will ask gurdeep to help me
Lattice
Rescoring
um no i mean yes but i am i've never done it myself did you
use yours before uh I will ask gurdeep to help me
Disfluencies in Conversational Speech
um no i mean yes but you know i am i've
never done it myself have you done that uh yes
Disfluency types:
• Filler Pauses
• Discourse Markers
• Repetition
• Corrections (“speech repairs”)
um no i mean yes but you know i am i've
never done it myself have you done that uh yes
um no i mean yes but i am i've never done it myself have you done that uh yes
um no
i mean yes
but i am
i've never done it myself
have you done that
uh yes
no ,, yes , but , i am , i've never done it myself have you done that yes
No.
I mean yes.
I am.
i've never done it myself.
have you done that?
Yes?
No,,yes
but , i am , i've never done it myself
have you done that
yes
Yes.
I've never done it myself .
Have you done that?
Yes?
Segmentation and Disfluency removal interact with each other
Segmentation
Disfluency removal
Simple disfluency removal
Segmentation
Complex disfluency removal
CRF-based Classifiers for annotation
P(y|F) =
1
Z(W,F) 𝑘 𝜆 𝑘 𝐺(𝑦, 𝐹)
Simple
Disfluency
Segmentation
and Punctuation
Complex
Disfluency
Segmentation and Disfluency Removal for Conversational Speech Translation
Hany Hassan, Lee Schwartz, Dilek Hakkani-Tur, and Gokhan Tur INTERSPEECH 2014
Sentence Unit Boundary Detection
•CRF Classifier: L2 Regularization, Features Cut-off=2
•Lexical Features
•Brown Clusters  Group semantically related words based on context
•POS tags trained on conversational data (another CRF classifier)
•Speech Pause-based duration
•Phrase-translation table n-gram
•Features on a window of two words on each side
Disfluency Removal, Punctuation insertion
and TrueCaser
•CRF Classifiers: L2 Regularization, Features Cut-off=2
•Lexical Features
•Brown Clusters
•POS tags trained on conversational data (another CRF classifier)
•Features on a window of two words on each side
Example of Complex Disfluency Removal
but , i’m , I’ve never done that before.
Example of Complex Disfluency removal
But I’ve never done that before.
Segmentation and Disfluency Removal Effect
(EnglishSpanish)
Segmentation Disfluency Handling BLEU on Transcripts BLEU on ASR
No segmentation (full utterance) None 22.13 19.13
No segmentation(full utterance) 1-stage 23.46 20.49
Segment on pauses None 20.32 18.78
Segment on pauses 1-stage 22.53 19.32
CRF Segmenter 1-stage after segmenter 25.11 21.24
CRF Segmenter 1-stage before segmenter 24.79 20.95
CRF Segmenter 2-stages (before & after) 25.65 (+16%) 21.76 (+13.7%)
Disfluency handling :
None: No Disfluency handling applied
After: applied after segmentation
Split: Simple Disfluency applied before segmentation , Complex Disfluency applied after segmentation
1 point of BLEU improvement is roughly equivalent to 1% absolute improvement in accuracy
The Speech Translation API
Public API for speech translation
www.microsoft.com/translator
API Documentation
• http://docs.microsofttranslator.com/
Sample Code using API
• https://github.com/MicrosoftTranslator
Cmdline parameters (ex of API usage)
Usage:
CmdLineSpeechTranslate.exe ClientId ClientSecret FilePath SrcLanguage TargetLanguage
Example:
CmdLineSpeechTranslate.exe ClientId ClientSecret helloworld.wav en-us es-es
Source: 1 of 8 spoken languages
Target: 1 of 50+ spoken languages
S2S in the Schools
Bilingual Mystery Skype
Deaf/Hard of Hearing Students
Seattle and Beijing, China
Seattle and Beijing, China
Deaf and Hard of Hearing Students
• In Seattle Public Schools, Jean Rogers’ (Chief Audiologist) and Liz Hayden’s
(Teacher of the Deaf) idea:
• Use Skype Translator with the “mainstreamed” deaf and hard of hearing kids
Deaf and Hard of Hearing Students
Deaf and Hard of Hearing Students
Deaf and Hard of Hearing Students
S2S in the Classroom
• https://www.microsoft.com/en-us/design/inclusive#inclusive-
skype_video
www.microsoft.com/translator
Translator@Microsoft.com
@mstranslator
Chris.Wendt@Microsoft.com
@Tian500

More Related Content

What's hot

5 stages chain process powerpoint slides.
5 stages chain process powerpoint slides.5 stages chain process powerpoint slides.
5 stages chain process powerpoint slides.SlideTeam.net
 
5 stages chain process powerpoint templates.
5 stages chain process powerpoint templates.5 stages chain process powerpoint templates.
5 stages chain process powerpoint templates.SlideTeam.net
 
5 stages chain strategy powerpoint templates.
5 stages chain strategy powerpoint templates.5 stages chain strategy powerpoint templates.
5 stages chain strategy powerpoint templates.SlideTeam.net
 
Business strategy in chains strategy 7 stages powerpoint slides.
Business strategy in chains strategy 7 stages powerpoint slides.Business strategy in chains strategy 7 stages powerpoint slides.
Business strategy in chains strategy 7 stages powerpoint slides.SlideTeam.net
 
Business strategy in chains strategy 7 stages powerpoint templates.
Business strategy in chains strategy 7 stages powerpoint templates.Business strategy in chains strategy 7 stages powerpoint templates.
Business strategy in chains strategy 7 stages powerpoint templates.SlideTeam.net
 
Business process in chains process 7 stages powerpoint templates.
Business process in chains process 7 stages powerpoint templates.Business process in chains process 7 stages powerpoint templates.
Business process in chains process 7 stages powerpoint templates.SlideTeam.net
 
Business process in chains process 7 stages powerpoint slides.
Business process in chains process 7 stages powerpoint slides.Business process in chains process 7 stages powerpoint slides.
Business process in chains process 7 stages powerpoint slides.SlideTeam.net
 
Chains process 7 stages powerpoint slides ppt templates
Chains process 7 stages powerpoint slides ppt templatesChains process 7 stages powerpoint slides ppt templates
Chains process 7 stages powerpoint slides ppt templatesSlideTeam.net
 
Links of chains pieces weakest links process 8 stages style 1 powerpoint diag...
Links of chains pieces weakest links process 8 stages style 1 powerpoint diag...Links of chains pieces weakest links process 8 stages style 1 powerpoint diag...
Links of chains pieces weakest links process 8 stages style 1 powerpoint diag...SlideTeam.net
 
Sltu12
Sltu12Sltu12
Sltu12tihtow
 
Links of chains pieces weakest links process 5 stages style 1 powerpoint diag...
Links of chains pieces weakest links process 5 stages style 1 powerpoint diag...Links of chains pieces weakest links process 5 stages style 1 powerpoint diag...
Links of chains pieces weakest links process 5 stages style 1 powerpoint diag...SlideTeam.net
 

What's hot (12)

5 stages chain process powerpoint slides.
5 stages chain process powerpoint slides.5 stages chain process powerpoint slides.
5 stages chain process powerpoint slides.
 
5 stages chain process powerpoint templates.
5 stages chain process powerpoint templates.5 stages chain process powerpoint templates.
5 stages chain process powerpoint templates.
 
5 stages chain strategy powerpoint templates.
5 stages chain strategy powerpoint templates.5 stages chain strategy powerpoint templates.
5 stages chain strategy powerpoint templates.
 
Resumos inglês (2º teste)
Resumos inglês (2º teste)Resumos inglês (2º teste)
Resumos inglês (2º teste)
 
Business strategy in chains strategy 7 stages powerpoint slides.
Business strategy in chains strategy 7 stages powerpoint slides.Business strategy in chains strategy 7 stages powerpoint slides.
Business strategy in chains strategy 7 stages powerpoint slides.
 
Business strategy in chains strategy 7 stages powerpoint templates.
Business strategy in chains strategy 7 stages powerpoint templates.Business strategy in chains strategy 7 stages powerpoint templates.
Business strategy in chains strategy 7 stages powerpoint templates.
 
Business process in chains process 7 stages powerpoint templates.
Business process in chains process 7 stages powerpoint templates.Business process in chains process 7 stages powerpoint templates.
Business process in chains process 7 stages powerpoint templates.
 
Business process in chains process 7 stages powerpoint slides.
Business process in chains process 7 stages powerpoint slides.Business process in chains process 7 stages powerpoint slides.
Business process in chains process 7 stages powerpoint slides.
 
Chains process 7 stages powerpoint slides ppt templates
Chains process 7 stages powerpoint slides ppt templatesChains process 7 stages powerpoint slides ppt templates
Chains process 7 stages powerpoint slides ppt templates
 
Links of chains pieces weakest links process 8 stages style 1 powerpoint diag...
Links of chains pieces weakest links process 8 stages style 1 powerpoint diag...Links of chains pieces weakest links process 8 stages style 1 powerpoint diag...
Links of chains pieces weakest links process 8 stages style 1 powerpoint diag...
 
Sltu12
Sltu12Sltu12
Sltu12
 
Links of chains pieces weakest links process 5 stages style 1 powerpoint diag...
Links of chains pieces weakest links process 5 stages style 1 powerpoint diag...Links of chains pieces weakest links process 5 stages style 1 powerpoint diag...
Links of chains pieces weakest links process 5 stages style 1 powerpoint diag...
 

Similar to Conversational Speech Translation - Challenges and Techniques, by Chris Wendt, Microsoft

Speech to Speech real time translations, Aigars Macins, Skype
Speech to Speech real time translations, Aigars Macins, SkypeSpeech to Speech real time translations, Aigars Macins, Skype
Speech to Speech real time translations, Aigars Macins, SkypeTAUS - The Language Data Network
 
Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition systemArun Tiwari
 
Language Learning & Technology with Young Learners
Language Learning & Technology with Young LearnersLanguage Learning & Technology with Young Learners
Language Learning & Technology with Young LearnersGraham Stanley
 
透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能Amazon Web Services
 
Alexa Skill Development Workshop Madrid 20181016
Alexa Skill Development Workshop Madrid 20181016Alexa Skill Development Workshop Madrid 20181016
Alexa Skill Development Workshop Madrid 20181016German Viscuso
 
Narrate Your Way To Success
Narrate Your Way To SuccessNarrate Your Way To Success
Narrate Your Way To SuccessTCUK
 
Apps: Moving Beyond Drill & Practice
Apps: Moving Beyond Drill & PracticeApps: Moving Beyond Drill & Practice
Apps: Moving Beyond Drill & Practicetousignantp
 
Apps: Moving Beyond Drill & Practice
Apps: Moving Beyond Drill & PracticeApps: Moving Beyond Drill & Practice
Apps: Moving Beyond Drill & Practicetousignantp
 
textprocessingboth.pptx
textprocessingboth.pptxtextprocessingboth.pptx
textprocessingboth.pptxbdiot
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient
 
Learning for sequences - Adam Mathias
Learning for sequences  - Adam MathiasLearning for sequences  - Adam Mathias
Learning for sequences - Adam MathiasDataFest Tbilisi
 
Accent reduction by Justin Murray @ REAL LIFE English
Accent reduction by Justin Murray @ REAL LIFE EnglishAccent reduction by Justin Murray @ REAL LIFE English
Accent reduction by Justin Murray @ REAL LIFE EnglishJason R. Levine
 
Lingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language LearningLingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language LearningAndré Karpištšenko
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)Amazon Web Services
 

Similar to Conversational Speech Translation - Challenges and Techniques, by Chris Wendt, Microsoft (20)

Speech to Speech real time translations, Aigars Macins, Skype
Speech to Speech real time translations, Aigars Macins, SkypeSpeech to Speech real time translations, Aigars Macins, Skype
Speech to Speech real time translations, Aigars Macins, Skype
 
Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition system
 
Language Learning & Technology with Young Learners
Language Learning & Technology with Young LearnersLanguage Learning & Technology with Young Learners
Language Learning & Technology with Young Learners
 
Day - 5.pptx
Day - 5.pptxDay - 5.pptx
Day - 5.pptx
 
透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能
 
Alexa Skill Development Workshop Madrid 20181016
Alexa Skill Development Workshop Madrid 20181016Alexa Skill Development Workshop Madrid 20181016
Alexa Skill Development Workshop Madrid 20181016
 
Teaching speaking
Teaching speakingTeaching speaking
Teaching speaking
 
Narrate Your Way To Success
Narrate Your Way To SuccessNarrate Your Way To Success
Narrate Your Way To Success
 
Apps: Moving Beyond Drill & Practice
Apps: Moving Beyond Drill & PracticeApps: Moving Beyond Drill & Practice
Apps: Moving Beyond Drill & Practice
 
Apps: Moving Beyond Drill & Practice
Apps: Moving Beyond Drill & PracticeApps: Moving Beyond Drill & Practice
Apps: Moving Beyond Drill & Practice
 
textprocessingboth.pptx
textprocessingboth.pptxtextprocessingboth.pptx
textprocessingboth.pptx
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
 
Learning for sequences - Adam Mathias
Learning for sequences  - Adam MathiasLearning for sequences  - Adam Mathias
Learning for sequences - Adam Mathias
 
Besig workshop
Besig workshopBesig workshop
Besig workshop
 
Accent reduction by Justin Murray @ REAL LIFE English
Accent reduction by Justin Murray @ REAL LIFE EnglishAccent reduction by Justin Murray @ REAL LIFE English
Accent reduction by Justin Murray @ REAL LIFE English
 
Lingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language LearningLingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language Learning
 
Wh question words
Wh question wordsWh question words
Wh question words
 
DATABASES
DATABASESDATABASES
DATABASES
 
Web AI.pptx
Web AI.pptxWeb AI.pptx
Web AI.pptx
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
 

More from TAUS - The Language Data Network

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS - The Language Data Network
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...TAUS - The Language Data Network
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)TAUS - The Language Data Network
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...TAUS - The Language Data Network
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...TAUS - The Language Data Network
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...TAUS - The Language Data Network
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...TAUS - The Language Data Network
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...TAUS - The Language Data Network
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...TAUS - The Language Data Network
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)TAUS - The Language Data Network
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...TAUS - The Language Data Network
 

More from TAUS - The Language Data Network (20)

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
 
Farmer Lv (TrueTran)
Farmer Lv (TrueTran)Farmer Lv (TrueTran)
Farmer Lv (TrueTran)
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Translation Technology Showcase in Shenzhen
Translation Technology Showcase in ShenzhenTranslation Technology Showcase in Shenzhen
Translation Technology Showcase in Shenzhen
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 
QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 

Recently uploaded

Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Conversational Speech Translation - Challenges and Techniques, by Chris Wendt, Microsoft

  • 1. Conversational Speech Translation Challenges and Techniques Chris.Wendt@Microsoft.com @Tian500 with Will LewisTAUS Forum Tokyo – April 26, 2016 V160418
  • 2. Why now? Confluence of factors: Steady progress in MT quality over the last few years • Using huge amounts of data Technological Leap in ASR • Deep Learning (DNNs) – 33+% WER reduction over GMMs (Seide et al 2011)  From average of 30% down to 20%, in English  Now above 42% • More robust to noise, speaker variation, accents Skype • A global platform to put speech translation in the hands of 100s of Millions of users
  • 3. Skype Translator: Goal •To support open-domain conversations between Skype users in different parts of the world, speaking different languages •
  • 5. Skype Translator: What is it? •Current state of the art in Speech Recognition and Machine Translation embedded in a VoIP client: Skype
  • 6. Skype Translator = Universal Translator?
  • 7. Skype Translator = Universal Translator?
  • 8. Automatic Speech Recognition (ASR) Microsoft Translator Skype Infrastructure New Skype Translator Client app Skype Translator Microsoft Speech to Speech (S2S): What is it? • 1. High quality speech recognition •
  • 10. The Challenges • The gulf between speech and text • It’s not enough to just chain a really good ASR system with a really good MT system • How people talk to each other is not how they write • Building really good conversational ASR and MT systems • Significant changes in the data we use to train the ASR and MT systems. • The gap between technology demo and consumer product • Producing models with shippable latency • Interesting problems one encounters with real consumers
  • 11. How people really speak What person thought they said: Yeah. I guess it was worth it.  Ja. Ich denke, es hat sich gelohnt.  はい。私はそれの価値があったと思います。 What they actually said: Yeah, but um, but it was you know, it was, I guess, it was worth it.  Ja, aber ähm, aber es war, weißt du, es war, ich denke, es hat sich gelohnt.  はい、ええと、あなたが知っている、だったが、推測すると、それはそれの価値があった けど。 Disfluency removal More than just removing “um” and “ah”
  • 12. Disfluencies in Conversational Speech um no i mean yes but you know i am i've never done it myself have you done that uh yes Disfluency types: • Pause Fillers • Discourse Markers • Repetition • Corrections (“speech repairs”) um no i mean yes but you know i am i've never done it myself have you done that uh yes Yes. But, I’ve never done it myself. Have you done that? Yes?
  • 13. Disfluencies in Conversational Speech um no i mean yes but you know i am i've never done it myself have you done that uh yes um no i mean yes but you know i am i've never done it myself have you done that uh yes Yes. But, I’ve never done it myself. Have you done that? Yes? Need to: 1. Segment 2. Remove disfluencies 3. Punctuate 4. Add case
  • 14. Without TrueText •um no i mean yes but you know i am i've never done it myself have you done that uh yes •um i いという意味ないが、私は知っている私はそれをやったことが ない自分をした、ええとはい Translate
  • 15. With TrueText Translate Yes. But, I’ve never done it myself. Have you done that? Yes? はい。 しかし、私はそれを自分自身を 行ってきたこと。 行っているか。 はい?
  • 16. Missing punc  Catastrophic Effects Questions ¿vas ahora?  are you going now? vas ahora  go now Negation no es mi segundo  it is not my second no. es mi segundo  no. it’s my second Seriously embarrassing tienes una hija ¿no? es muy preciosa  you have a daughter right? is very beautiful tienes una hija no es muy preciosa  you have a daughter is not very beautiful
  • 17. Accents/Wrong chars  Changes in meaning Accented words (sound-alikes) • Written with different forms  different meanings • But pronounced the same Si los vinos mendocinos son muy famosos If the wines from Mendoza are very famous Sí los vinos mendocinos son muy famosos Yes the wines from Mendoza are very famous Misrecognized words/characters (sound-alikes) 你 经常在没有听完的时候就睡着了吗 Do you often fall asleep without listening to it? 你 经常在没有听完的时候就睡着了嘛 You often fall asleep without listening to it.
  • 18. How people say things Here’s what we need to recognize and translate • He ain't my choice. But, hey, we hated the last guy. • We're going to hit it and quit it. • Boy, that story gets better every time you hear it. • I swear to God I am done with guys like that. Unfortunately a lot of our MT training data looks like this • Mr President, Commissioner, Mr Sacconi, ladies and gentlemen, as the PPE-DE's coordinator for regional policy, I want to stress that some very important points are made in this resolution. • I am therefore calling for integrated policies, all-encompassing policies that we can adapt to society, which must listen to our recommendations and comply with them.
  • 19. Data mismatch & scarcity Training data mismatch • MT training is clearly mismatched • ASR training data is a mixed bag Data scarcity • Traditional data sources (govt, news, web) not well matched • Not a lot of parallel conversational data (for MT) • Not a lot of transcribed conversational data (for ASR)
  • 20. ASR: word errors, missing vocab ASR vocab issues – e.g. names Hi Arul  Hi Aaron I went skiing at Snoqualmie pass  I went skiing at snow call me pass ASR errors How do we minimize the impact of misrecognized words?
  • 22. The Challenges • Conversational speaking style • Open domain Key enabler: dramatic ASR improvements from using Deep Neural Networks Where to get training data? • US English: DARPA Switchboard (2000h) is a great start; but no comparable corpus for other languages • Use “found” captioned speech.  Many thousands of hours of speech used for English system
  • 23. Training Data: Audio w/ Fluent Transcripts • Disfluent (what we want): Well I uh started this this project while I was a student uh grad student at uh Stan- Stanford • Fluent (what we get): I started this project while I was a grad student at Stanford Recreate disfluent training material
  • 24. The Solutions Machine Translation > ASR output Adapting MT to Conversational Domain
  • 25. ASR/MT Mismatch Significant data mismatch between ASR output (even when cleaned) and MT: • He ain't my choice. But, hey, we hated the last guy. • We're going to hit it and quit it. vs. • Mr President, Commissioner, Mr Sacconi, ladies and gentlemen, as the PPE-DE's coordinator for regional policy, I want to stress that some very important points are made in this resolution. But where do we get parallel conversational data? Example: Movie subtitles
  • 26. Data Selection • Sample “in-domain” (“in-register”?) data from our en-fr parallel data store • Leverage the fact that the data pool does not match the target domain • Use monolingual conversational data as seed (“in-domain”): CallHome, SWBD • Use Cross-Entropy Difference method (Moore-Lewis 2010) against very large parallel corpus (for ENU-FRA, many 100s of Ms sentences) • Train on combination of subtitle and DA data
  • 27. The solution TrueText (ASR > MT) Bridging the gap between ASR and MT
  • 28. Oui. Mais je ne l'ai jamais fait moi- même. Avez-vous utilisé le vôtre avant ? Gurdeep va demander de l'aide. Speech Recognition Speech Correction Translation Text to Speech Raw ASR Output um no i mean yes but i am i've never done it myself did users before uh I will ask go deep to help me Euh non je veux dire oui mais je suis je l'ai jamais fait moi-même fait util-isateurs avant euh je vais demander à aller pro-fond pour m'aider Customization and Personalization um no i mean yes but i am i've never done it myself did users before uh I will ask gurdeep to help me Segmentation Punctuation and True Casing Yes. But I’ve never done it myself. Did you use yours before? I will ask Gurdeep to help me. Disfluency Removal no i mean yes but I am i've never done it myself did you use yours before uh I will ask gurdeep to help me Lattice Rescoring um no i mean yes but i am i've never done it myself did you use yours before uh I will ask gurdeep to help me
  • 29. Oui. Mais je ne l'ai jamais fait moi- même. Avez-vous utilisé le vôtre avant ? Gurdeep va demander de l'aide. Raw ASR Output um no i mean yes but i am i've never done it myself did users before uh I will ask go deep to help me Customization and Personalization um no i mean yes but i am i've never done it myself did users before uh I will ask gurdeep to help me Segmentation Punctuation and True Casing Yes. But I’ve never done it myself. Did you use yours before? I will ask Gurdeep to help me. Disfluency Removal no i mean yes but I am i've never done it myself did you use yours before uh I will ask gurdeep to help me Lattice Rescoring um no i mean yes but i am i've never done it myself did you use yours before uh I will ask gurdeep to help me Speech Recognition Speech Correction Translation Text to Speech um 意味ないはい、私です 私はそれをやったことがな い自分はユーザー、ええと 私は手伝って深く要求され ます前に
  • 30. Oui. Mais je ne l'ai jamais fait moi- même. Avez-vous utilisé le vôtre avant ? Gurdeep va demander de l'aide. Raw ASR Output um no i mean yes but i am i've never done it myself did users before uh I will ask go deep to help me Customization and Personalization um no i mean yes but i am i've never done it myself did users before uh I will ask gurdeep to help me Segmentation Punctuation and True Casing Yes. But I’ve never done it myself. Did you use yours before? I will ask Gurdeep to help me. Disfluency Removal no i mean yes but I am i've never done it myself did you use yours before uh I will ask gurdeep to help me Lattice Rescoring um no i mean yes but i am i've never done it myself did you use yours before uh I will ask gurdeep to help me Speech Recognition Speech Correction Translation Text to Speech um 意味ないはい、私です 私はそれをやったことがな い自分はユーザー、ええと 私は手伝って深く要求され ます前に
  • 31. Personalization and Customization Client Skype Translator Service User Profiles Object Stores Speech Recognition Customized Language Models Cloud Storage Customized Models Machine Translation CLM
  • 32. Personalized Names Handling • Name recognition is a well known problem in large vocabulary ASR • Supporting high-recall names recognition usually compromises WER. • We deploy high-precision approach to support contacts names recognition using personalized names lists • Personalized names can be recognized in any context • Examples: • Hello Ignacio, how are you doing today? • I will meet Arul Menezes for lunch tomorrow. Client Client Speech Recognition with Generic LM Customized LM Contact names
  • 33. Oui. Mais je ne l'ai jamais fait moi- même. Avez-vous utilisé le vôtre avant ? Gurdeep va demander de l'aide. Raw ASR Output um no i mean yes but i am i've never done it myself did users before uh I will ask go deep to help me Customization and Personalization um no i mean yes but i am i've never done it myself did users before uh I will ask gurdeep to help me Segmentation Punctuation and True Casing Yes. But I’ve never done it myself. Did you use yours before? I will ask Gurdeep to help me. Disfluency Removal no i mean yes but I am i've never done it myself did you use yours before uh I will ask gurdeep to help me Lattice Rescoring um no i mean yes but i am i've never done it myself did you use yours before uh I will ask gurdeep to help me Speech Recognition Speech Correction Translation Text to Speech um 意味ないはい、私です 私はそれをやったことがな い自分はユーザー、ええと 私は手伝って深く要求され ます前に
  • 34. Some early experiments: The error cascade Speech Recognition Translation Engine 1-best ASR Proposed solutions • Feed n-best list of ASR output to MT • Use speech lattice directly as input to MT (e.g. Matusov et al., 2005, Lavie et al. 2004, Dyer et al, 2008) • Confusion network decoding (e.g. Bertoldi et al., 2007; Bertoldi and Federico, 2005)
  • 35. Lattice rescoring Rescoring ASR lattice with • Much bigger LM (100x larger than first pass) • MT-specific features • Tuning weights • WER reduction 1-2% absolute • BLEU improvement 1-2% absolute Cherry picked examples Ref: what do you use yours for mostly ASR: do users for mostly Rescored: do you use yours for mostly Ref: but we're in a subdivision ASR: but where in a subdivision Rescored: but we're in a subdivision
  • 36. Oui. Mais je ne l'ai jamais fait moi- même. Avez-vous utilisé le vôtre avant ? Gurdeep va demander de l'aide. Disfluency Removal no i mean yes but I am i've never done it myself did you use yours before uh I will ask gurdeep to help me Speech Recognition Speech Correction Translation Text to Speech Raw ASR Output um no i mean yes but i am i've never done it myself did users before uh I will ask go deep to help me Customization and Personalization um no i mean yes but i am i've never done it myself did users before uh I will ask gurdeep to help me Lattice Rescoring um no i mean yes but i am i've never done it myself did you use yours before uh I will ask gurdeep to help me um 意味ないはい、私です 私はそれをやったことがな い自分はユーザー、ええと 私は手伝って深く要求され ます前に
  • 37. はい。 しかし、私はそれを自分自身を行ってきたこと。 あなたは前にあなたを使用しましたか。 私は私を助けるための Gurdeep が要求されま す。 um 意味ないはい、私です 私はそれをやったことがな い自分はユーザー、ええと 私は手伝って深く要求され ます前に Segmentation Punctuation and True Casing Yes. But I’ve never done it myself. Did you use yours before? I will ask Gurdeep to help me. Speech Recognition Speech Correction Translation Text to Speech Disfluency Removal no i mean yes but I am i've never done it myself did you use yours before uh I will ask gurdeep to help me Raw ASR Output um no i mean yes but i am i've never done it myself did users before uh I will ask go deep to help me Customization and Personalization um no i mean yes but i am i've never done it myself did users before uh I will ask gurdeep to help me Lattice Rescoring um no i mean yes but i am i've never done it myself did you use yours before uh I will ask gurdeep to help me
  • 38. Disfluencies in Conversational Speech um no i mean yes but you know i am i've never done it myself have you done that uh yes Disfluency types: • Filler Pauses • Discourse Markers • Repetition • Corrections (“speech repairs”) um no i mean yes but you know i am i've never done it myself have you done that uh yes
  • 39. um no i mean yes but i am i've never done it myself have you done that uh yes um no i mean yes but i am i've never done it myself have you done that uh yes no ,, yes , but , i am , i've never done it myself have you done that yes No. I mean yes. I am. i've never done it myself. have you done that? Yes? No,,yes but , i am , i've never done it myself have you done that yes Yes. I've never done it myself . Have you done that? Yes? Segmentation and Disfluency removal interact with each other Segmentation Disfluency removal Simple disfluency removal Segmentation Complex disfluency removal
  • 40. CRF-based Classifiers for annotation P(y|F) = 1 Z(W,F) 𝑘 𝜆 𝑘 𝐺(𝑦, 𝐹) Simple Disfluency Segmentation and Punctuation Complex Disfluency Segmentation and Disfluency Removal for Conversational Speech Translation Hany Hassan, Lee Schwartz, Dilek Hakkani-Tur, and Gokhan Tur INTERSPEECH 2014
  • 41. Sentence Unit Boundary Detection •CRF Classifier: L2 Regularization, Features Cut-off=2 •Lexical Features •Brown Clusters  Group semantically related words based on context •POS tags trained on conversational data (another CRF classifier) •Speech Pause-based duration •Phrase-translation table n-gram •Features on a window of two words on each side
  • 42. Disfluency Removal, Punctuation insertion and TrueCaser •CRF Classifiers: L2 Regularization, Features Cut-off=2 •Lexical Features •Brown Clusters •POS tags trained on conversational data (another CRF classifier) •Features on a window of two words on each side
  • 43. Example of Complex Disfluency Removal but , i’m , I’ve never done that before.
  • 44. Example of Complex Disfluency removal But I’ve never done that before.
  • 45. Segmentation and Disfluency Removal Effect (EnglishSpanish) Segmentation Disfluency Handling BLEU on Transcripts BLEU on ASR No segmentation (full utterance) None 22.13 19.13 No segmentation(full utterance) 1-stage 23.46 20.49 Segment on pauses None 20.32 18.78 Segment on pauses 1-stage 22.53 19.32 CRF Segmenter 1-stage after segmenter 25.11 21.24 CRF Segmenter 1-stage before segmenter 24.79 20.95 CRF Segmenter 2-stages (before & after) 25.65 (+16%) 21.76 (+13.7%) Disfluency handling : None: No Disfluency handling applied After: applied after segmentation Split: Simple Disfluency applied before segmentation , Complex Disfluency applied after segmentation 1 point of BLEU improvement is roughly equivalent to 1% absolute improvement in accuracy
  • 46. The Speech Translation API Public API for speech translation www.microsoft.com/translator
  • 48. Sample Code using API • https://github.com/MicrosoftTranslator
  • 49. Cmdline parameters (ex of API usage) Usage: CmdLineSpeechTranslate.exe ClientId ClientSecret FilePath SrcLanguage TargetLanguage Example: CmdLineSpeechTranslate.exe ClientId ClientSecret helloworld.wav en-us es-es Source: 1 of 8 spoken languages Target: 1 of 50+ spoken languages
  • 50. S2S in the Schools Bilingual Mystery Skype Deaf/Hard of Hearing Students
  • 53. Deaf and Hard of Hearing Students • In Seattle Public Schools, Jean Rogers’ (Chief Audiologist) and Liz Hayden’s (Teacher of the Deaf) idea: • Use Skype Translator with the “mainstreamed” deaf and hard of hearing kids
  • 54. Deaf and Hard of Hearing Students
  • 55. Deaf and Hard of Hearing Students
  • 56. Deaf and Hard of Hearing Students
  • 57. S2S in the Classroom • https://www.microsoft.com/en-us/design/inclusive#inclusive- skype_video

Editor's Notes

  1. As we all know the idea of being able to speak naturally with someone who doesn’t understand your language has been a long-held dream. Whether we’re talking about the biblical story of the tower of babel, or 20th century sci-fi such as the Star Trek universal communicator or Douglas Adam’s “babelfish”
  2. About this time last year, we set ourselves a goal of trying to turn this age-old dream into reality. We realized that there was a confluence of factors that taken together gave us the opportunity to make this happen. We ourselves in the MT field have been making steady progress on MT quality both with better algorithms and by applying ever greater amounts of data, such that our best MT systems today are really quite good. At the same time, the ASR field has seen a technological leap over the last few years with the use of DNNs leading to dramatically lower errors rates. And finally, we at Microsoft Research, felt we had a golden opportunity not just to do a great technology demo, but to actually put this in the hands of 100s of Ms of users through Skype. In order to achieve this we faced a number of challenges, which is what I will be talking about for the next hour.
  3. On one hand: Help Skype users On the other hand: have a generalized speech translation system and service for 3rd parties.
  4. There are three main challenges here that we need to go through to have a realistic S2S system : The gulf between speech and text It’s not enough to just chain a really good ASR system with a really good MT system How people talk to each other is not how they write Building really good conversational ASR and MT systems Significant changes in the data we use to train the ASR and MT systems. The gap between technology demo and consumer product Plugging into Skype Interesting problems one encounters with real consumers
  5. [READ SLIDE] So, if we take the raw ASR output and just throw it at MT, it doesn’t work so well. We need components to process the ASR, remove disfluencies, etc and make it more palatable to MT. Likewise we need to adapt MT to handle this kind of input
  6. So let’s take a closer look at the different types of disfluencies – first you have uh, your, um, fillers, then, you know, I mean, your discourse markers and and and repetition, and finally correct--, I mean, speech repairs, where people go back and repeat, I mean, change what they said. In this example, here the speaker changed no to yes, and “I am” to “I have”.
  7. So let’s take a closer look at the different types of disfluencies – first you have uh, your, um, fillers, then, you know, I mean, your discourse markers and and and repetition, and finally correct--, I mean, speech repairs, where people go back and repeat, I mean, change what they said. In this example, here the speaker changed no to yes, and “I am” to “I have”.
  8. Another thing that is missing is punctuation -- If we don’t get punctuation right, we are risking a lot more than just word salad. You may know the example “Let’s eat grandma”, where a missing comma could lead to a tragic outcome. When translating, the problem gets worse.
  9. In the case of Spanish (and possibly other languages), we have an additional problem with accented words [READ SLIDE]
  10. In addition to *how* people speak (genre), there’s also a big difference in *what* the talk about (domain).
  11. Now let’s take a closer look on Speech Correction component which helps in bridging the gap between spoken and written text.
  12. [READ SLIDE] Adapting MT to ASR starts with building a good baseline conversationally-oriented MT system
  13. Now let’s take a closer look on Speech Correction component which helps in bridging the gap between spoken and written text.
  14. Speech Correction component helps in bridging the gap between spoken and written text.
  15. Let’s listen to what the user said. Very good fluent English. But if we read the english transcripts, it is hard to understand it, not to mention the translation. So, if we take the raw ASR output and just throw it at MT, it doesn’t work so well. We need components to process the ASR, remove disfluencies etc and make it more palatable to MT. Likewise we need to adapt MT to handle this kind of input Missing sentence boundaries, punctuation, casing and disfluency removal
  16. First we know who is talking to who, we can have user profiles for both of them that can enable us to do better job in both recognition and translation. Personalization and customization plays crucial role in open domain S2S since people are usually talking about broadly different things. For example here we can recognize a person name “gurdeep” rather than “go deep” Good customization and personalization is very crucial for open domain Skype Translator, people can talk about their palnned vacation to Columbia to Spanish speaker which needs different vocabulary than talking to a Chinese supplier next day next product plans.
  17. Open domain conversational S2S can benefit from customizing and personalizing the models according to the users profiles. We can use users’ profile to create customized models that can fir their topics and vocabulary. Describe the diagram above. Currency we use this infra-structure for contact names recognition.
  18. One of the issues with ASR is that the vocab cannot include every possible person name (or place name etc). Expanding the vocab drastically to include millions of names can compromise WER, because names may be misrecognized in place of regular words. However in the Skype Translator, we found that when the system didn’t recognize the caller or callee’s name at the start of a call, it often derailed the entire conversation. So we opted for a surgical fix for now while we investigate more broad-based options. What we’ve done is added a very small restrictive grammar comprised of common greetings etc, but with placeholders for names. At the start of a call we dynamically compile the contact names for the current caller and callee into this grammar, and our ASR engine can use this grammar in parallel with it’s broad based regular LM.
  19. Instead of using one best output from ASR, we can use n-best in lattice format. Lattice rescoring can help us in getting many possible alternatives form SR and then use very large LM to score them
  20. In the early days we were fixated on WER and it’s effect on BLEU and the error cascade, meaning that if we were to pipeline multiple error-prone components together, the errors would multiply. This is a well-studied problem that other researchers have studied for a number of years. One approach that’s been tried is to take the ASR lattice directly as input to the MT decoder. This has been studied by many groups and is conceptually elegant, but the implementation is quite complex. A lattice representation allows an MT system to arbitrate between multiple ambiguous hypotheses from upstream processing so that the best translation can be produced. A simplification is decoding over a confusion network, where the ASR confusables are compactly encoded as a “word sausage”. This is very easy to decode in MT because it affects mostly just the phrase-lookup portion, leaving the rest of the decoding untouched except for some extra features. We found that the MT portion of this worked well. However collapsing an ASR-lattice into a confusion network is an ill-defined operation which can result in some nasty artifacts in the confusion network such as a proliferation of epsilon arcs. After some experimentation with N-best rescoring and confusion networks, we decided to try a couple of different things. Confusion network: A Confusion Network (CN), also known as a sausage, is a weighted directed graph with the peculiarity that each path from the start node to the end node goes through all the other nodes. Speech Lattice: Maintains a set of candidates as a subset of models.
  21. We decided that before we plunged into full-fledged MT decoding over lattices, we would first try simply rescoring the ASR lattice with a much bigger LM, adding some extra MT-friendly features and tuning model weights. We discovered we could get very good ASR and end-to-end BLEU gains, at which point we decided not to bother with decoding over lattices in the MT decoder itself
  22. Now we have “almost” perfect transcription that match what is the user said, improved and customized as well. Are we ready to send this to MT. No yet. Disfleuncy handling should be done before we translate. But actually disfluence removal and punctuation need to be done concurrently.
  23. Finally, we come into segmentation into sentence units, punctuation and casing which should be ready for a state-of-the art MT system to produce reliable translation.
  24. So let’s take a closer look at the different types of disfluencies – first you have uh, your, um, fillers, then, you know, I mean, your discourse markers and and and repetition, and finally correct--, I mean, speech repairs, where people go back and repeat, I mean, change what they said. In this example, here the speaker changed no to yes, and “I am” to “I have”.
  25. Traditionally this problem has been solved by first doing segmentation and then disfluency removal But there is an interaction between segmentation and disfluency handling. If segmentation is done first, disfluency will lose a chance to make a better correction, and we’ll be left with numerous disfluent fragments. On the other hand, complex disfluency removal (speech repairs) needs sentence boundaries, so you can’t do that first either. What we did is split the difference. We do some simple disfluency is first, then segmentation, then complex disfluency removal.
  26. Conditional random fields (CRFs) are a class of statistical modelling method often applied in pattern recognition and machine learning, where they are used for structured prediction. Whereas an ordinary classifier predicts a label for a single sample without regard to "neighboring" samples, a CRF can take context into account; e.g., the linear chain CRF popular in natural language processing predicts sequences of labels for sequences of input samples. First remove simple disfluencies, then segment, then remove complex disfluencies First two stages are CRF sequence taggers Complex disfluency handling: Uses metadata annotated by previous stages Using iterative parsing (NLPwin parser) Needs sentence units
  27. Brown clustering is a hard hierarchical agglomerative clustering problem based on distributional information. It is typically applied to text, grouping words into clusters that are assumed to be semantically related by virtue of their having been embedded in similar contexts.
  28. In complex disfluency removal we take advantage of the NLPwin parser that was built for the MS Word grammar checker and so is robust to ungrammatical input. For example here we have a repeated subtree that is linguistically similar, and so the first subtree is removed. We also look for are constituents that  appear to be disconnected from other parts of the tree. When we spot a disfluency we remove it and reparse the resulting sentence because the removal of the disfluency could change the entire parse. We remove errors one by one and stop when we have no more edits or we hit a limit on the number of parses.
  29. Lots of numbers here. Looking at the last column which is what we care about, here are the takeaways Sentence breaking based on speaker pauses is a bad idea (lose 0.5 BLEU points) CRF sentence breaking by itself adds about 0.5 BLEU (vs no breaking at all i.e. translate the full utterance) Disfluency removal by itself adds about 1.3 BLEU Doing both gives you about 2 BLEU points and doing the split before/after adds another 0.5
  30. This is Vinny, who participated in the Mystery Skype session we had with the schools in Beijing. Vinny’s deaf, so it was wonderful for him to participate in these calls with his classmates. Even when he was unable to hear the response back from the students, he could read the translation of what they were saying.
  31. Although the use here demonstrates the use of the technology with deaf or hard of hearing students, it’s not much of a stretch to adapt the technology, since the components already exist, to hearing students that speak other languages. In fact, it could be used in that manner now. We haven’t tested it in this scenario…yet.