The document discusses challenges and techniques for conversational speech translation. It notes the confluence of factors enabling this, including steady progress in machine translation quality and technological leaps in automatic speech recognition using deep learning. A goal is to support open-domain conversations between Skype users speaking different languages. Key challenges include the gap between how people speak conversationally versus how data is formatted for training, and ensuring low latency for consumers. Techniques discussed include adapting machine translation to the conversational domain through specialized data selection and training TrueText to bridge automatic speech recognition and machine translation outputs.
2. Why now?
Confluence of factors:
Steady progress in MT quality over the last few years
• Using huge amounts of data
Technological Leap in ASR
• Deep Learning (DNNs) – 33+% WER reduction over GMMs (Seide et al 2011)
From average of 30% down to 20%, in English
Now above 42%
• More robust to noise, speaker variation, accents
Skype
• A global platform to put speech translation in the hands of 100s of Millions of
users
3. Skype Translator: Goal
•To support open-domain conversations between Skype users in
different parts of the world, speaking different languages
•
10. The Challenges
• The gulf between speech and text
• It’s not enough to just chain a really good ASR system with a really good MT system
• How people talk to each other is not how they write
• Building really good conversational ASR and MT systems
• Significant changes in the data we use to train the ASR and MT systems.
• The gap between technology demo and consumer product
• Producing models with shippable latency
• Interesting problems one encounters with real consumers
11. How people really speak
What person thought they said:
Yeah. I guess it was worth it.
Ja. Ich denke, es hat sich gelohnt.
はい。私はそれの価値があったと思います。
What they actually said:
Yeah, but um, but it was you know, it was, I guess, it was worth it.
Ja, aber ähm, aber es war, weißt du, es war, ich denke, es hat sich gelohnt.
はい、ええと、あなたが知っている、だったが、推測すると、それはそれの価値があった
けど。
Disfluency removal
More than just removing “um” and “ah”
12. Disfluencies in Conversational Speech
um no i mean yes but you know i am i've
never done it myself have you done that uh yes
Disfluency types:
• Pause Fillers
• Discourse Markers
• Repetition
• Corrections (“speech repairs”)
um no i mean yes but you know i am i've
never done it myself have you done that uh yes
Yes.
But, I’ve never done it myself.
Have you done that?
Yes?
13. Disfluencies in Conversational Speech
um no i mean yes but you know i am i've
never done it myself have you done that uh yes
um no i mean yes but you know i am i've
never done it myself have you done that uh yes
Yes.
But, I’ve never done it myself.
Have you done that?
Yes?
Need to:
1. Segment
2. Remove disfluencies
3. Punctuate
4. Add case
14. Without TrueText
•um no i mean yes but you know i am i've never done it myself
have you done that uh yes
•um i いという意味ないが、私は知っている私はそれをやったことが
ない自分をした、ええとはい
Translate
16. Missing punc Catastrophic Effects
Questions
¿vas ahora? are you going now?
vas ahora go now
Negation
no es mi segundo it is not my second
no. es mi segundo no. it’s my second
Seriously embarrassing
tienes una hija ¿no? es muy preciosa you have a daughter right? is very beautiful
tienes una hija no es muy preciosa you have a daughter is not very beautiful
17. Accents/Wrong chars Changes in meaning
Accented words (sound-alikes)
• Written with different forms different meanings
• But pronounced the same
Si los vinos mendocinos son muy famosos
If the wines from Mendoza are very famous
Sí los vinos mendocinos son muy famosos
Yes the wines from Mendoza are very famous
Misrecognized words/characters (sound-alikes)
你 经常在没有听完的时候就睡着了吗 Do you often fall asleep without listening to it?
你 经常在没有听完的时候就睡着了嘛 You often fall asleep without listening to it.
18. How people say things
Here’s what we need to recognize and translate
• He ain't my choice. But, hey, we hated the last guy.
• We're going to hit it and quit it.
• Boy, that story gets better every time you hear it.
• I swear to God I am done with guys like that.
Unfortunately a lot of our MT training data looks like this
• Mr President, Commissioner, Mr Sacconi, ladies and gentlemen, as the PPE-DE's coordinator for
regional policy, I want to stress that some very important points are made in this resolution.
• I am therefore calling for integrated policies, all-encompassing policies that we can adapt to society,
which must listen to our recommendations and comply with them.
19. Data mismatch & scarcity
Training data mismatch
• MT training is clearly mismatched
• ASR training data is a mixed bag
Data scarcity
• Traditional data sources (govt, news, web) not well matched
• Not a lot of parallel conversational data (for MT)
• Not a lot of transcribed conversational data (for ASR)
20. ASR: word errors, missing vocab
ASR vocab issues – e.g. names
Hi Arul Hi Aaron
I went skiing at Snoqualmie pass I went skiing at snow call me pass
ASR errors
How do we minimize the impact of misrecognized words?
22. The Challenges
• Conversational speaking style
• Open domain
Key enabler: dramatic ASR improvements from using Deep Neural
Networks
Where to get training data?
• US English: DARPA Switchboard (2000h) is a great start; but no comparable corpus for other
languages
• Use “found” captioned speech.
Many thousands of hours of speech used for English system
23. Training Data: Audio w/ Fluent Transcripts
• Disfluent (what we want): Well I uh started this this project while
I was a student uh grad student at uh Stan- Stanford
• Fluent (what we get): I started this project while I was a grad
student at Stanford
Recreate disfluent training material
25. ASR/MT Mismatch
Significant data mismatch between ASR output (even
when cleaned) and MT:
• He ain't my choice. But, hey, we hated the last guy.
• We're going to hit it and quit it.
vs.
• Mr President, Commissioner, Mr Sacconi, ladies and gentlemen, as the PPE-DE's coordinator for
regional policy, I want to stress that some very important points are made in this resolution.
But where do we get parallel conversational data?
Example: Movie subtitles
26. Data Selection
• Sample “in-domain” (“in-register”?) data from our en-fr parallel
data store
• Leverage the fact that the data pool does not match the target domain
• Use monolingual conversational data as seed (“in-domain”): CallHome, SWBD
• Use Cross-Entropy Difference method (Moore-Lewis 2010) against very large
parallel corpus (for ENU-FRA, many 100s of Ms sentences)
• Train on combination of subtitle and DA data
28. Oui.
Mais je ne l'ai jamais fait moi-
même.
Avez-vous utilisé le vôtre avant ?
Gurdeep va demander de l'aide.
Speech
Recognition
Speech
Correction
Translation
Text
to
Speech
Raw ASR
Output
um no i mean yes but i am i've never done it myself did users
before uh I will ask go deep to help me
Euh non je veux dire oui
mais je suis je l'ai jamais fait
moi-même fait util-isateurs
avant euh je vais demander
à aller pro-fond pour m'aider
Customization
and Personalization
um no i mean yes but i am i've never done it myself did users
before uh I will ask gurdeep to help me
Segmentation
Punctuation
and True Casing
Yes.
But I’ve never done it myself.
Did you use yours before?
I will ask Gurdeep to help me.
Disfluency Removal
no i mean yes but I am i've never done it myself did you use
yours before uh I will ask gurdeep to help me
Lattice
Rescoring
um no i mean yes but i am i've never done it myself did you
use yours before uh I will ask gurdeep to help me
29. Oui.
Mais je ne l'ai jamais fait moi-
même.
Avez-vous utilisé le vôtre avant ?
Gurdeep va demander de l'aide.
Raw ASR
Output
um no i mean yes but i am i've never done it myself did users
before uh I will ask go deep to help me
Customization
and Personalization
um no i mean yes but i am i've never done it myself did users
before uh I will ask gurdeep to help me
Segmentation
Punctuation
and True Casing
Yes.
But I’ve never done it myself.
Did you use yours before?
I will ask Gurdeep to help me.
Disfluency Removal
no i mean yes but I am i've never done it myself did you use
yours before uh I will ask gurdeep to help me
Lattice
Rescoring
um no i mean yes but i am i've never done it myself did you
use yours before uh I will ask gurdeep to help me
Speech
Recognition
Speech
Correction
Translation
Text
to
Speech
um 意味ないはい、私です
私はそれをやったことがな
い自分はユーザー、ええと
私は手伝って深く要求され
ます前に
30. Oui.
Mais je ne l'ai jamais fait moi-
même.
Avez-vous utilisé le vôtre avant ?
Gurdeep va demander de l'aide.
Raw ASR
Output
um no i mean yes but i am i've never done it myself did users
before uh I will ask go deep to help me
Customization
and Personalization
um no i mean yes but i am i've never done it myself did users
before uh I will ask gurdeep to help me
Segmentation
Punctuation
and True Casing
Yes.
But I’ve never done it myself.
Did you use yours before?
I will ask Gurdeep to help me.
Disfluency Removal
no i mean yes but I am i've never done it myself did you use
yours before uh I will ask gurdeep to help me
Lattice
Rescoring
um no i mean yes but i am i've never done it myself did you
use yours before uh I will ask gurdeep to help me
Speech
Recognition
Speech
Correction
Translation
Text
to
Speech
um 意味ないはい、私です
私はそれをやったことがな
い自分はユーザー、ええと
私は手伝って深く要求され
ます前に
31. Personalization and Customization
Client
Skype Translator
Service
User Profiles
Object Stores
Speech
Recognition
Customized
Language
Models
Cloud Storage
Customized
Models
Machine
Translation
CLM
32. Personalized Names Handling
• Name recognition is a well known problem in large vocabulary ASR
• Supporting high-recall names recognition usually compromises WER.
• We deploy high-precision approach to support contacts names recognition using personalized names lists
• Personalized names can be recognized in any context
• Examples:
• Hello Ignacio, how are you doing today?
• I will meet Arul Menezes for lunch tomorrow.
Client Client
Speech
Recognition
with Generic LM
Customized
LM
Contact names
33. Oui.
Mais je ne l'ai jamais fait moi-
même.
Avez-vous utilisé le vôtre avant ?
Gurdeep va demander de l'aide.
Raw ASR
Output
um no i mean yes but i am i've never done it myself did users
before uh I will ask go deep to help me
Customization
and Personalization
um no i mean yes but i am i've never done it myself did users
before uh I will ask gurdeep to help me
Segmentation
Punctuation
and True Casing
Yes.
But I’ve never done it myself.
Did you use yours before?
I will ask Gurdeep to help me.
Disfluency Removal
no i mean yes but I am i've never done it myself did you use
yours before uh I will ask gurdeep to help me
Lattice
Rescoring
um no i mean yes but i am i've never done it myself did you
use yours before uh I will ask gurdeep to help me
Speech
Recognition
Speech
Correction
Translation
Text
to
Speech
um 意味ないはい、私です
私はそれをやったことがな
い自分はユーザー、ええと
私は手伝って深く要求され
ます前に
34. Some early experiments: The error cascade
Speech
Recognition
Translation
Engine
1-best ASR
Proposed solutions
• Feed n-best list of ASR output to MT
• Use speech lattice directly as input to MT (e.g. Matusov et al., 2005, Lavie et al. 2004, Dyer et al, 2008)
• Confusion network decoding (e.g. Bertoldi et al., 2007; Bertoldi and Federico, 2005)
35. Lattice rescoring
Rescoring ASR lattice with
• Much bigger LM (100x larger than first pass)
• MT-specific features
• Tuning weights
• WER reduction 1-2% absolute
• BLEU improvement 1-2% absolute
Cherry picked examples
Ref: what do you use yours for mostly
ASR: do users for mostly
Rescored: do you use yours for mostly
Ref: but we're in a subdivision
ASR: but where in a subdivision
Rescored: but we're in a subdivision
36. Oui.
Mais je ne l'ai jamais fait moi-
même.
Avez-vous utilisé le vôtre avant ?
Gurdeep va demander de l'aide.
Disfluency Removal
no i mean yes but I am i've never done it myself did you use
yours before uh I will ask gurdeep to help me
Speech
Recognition
Speech
Correction
Translation
Text
to
Speech
Raw ASR
Output
um no i mean yes but i am i've never done it myself did users
before uh I will ask go deep to help me
Customization
and Personalization
um no i mean yes but i am i've never done it myself did users
before uh I will ask gurdeep to help me
Lattice
Rescoring
um no i mean yes but i am i've never done it myself did you
use yours before uh I will ask gurdeep to help me
um 意味ないはい、私です
私はそれをやったことがな
い自分はユーザー、ええと
私は手伝って深く要求され
ます前に
37. はい。
しかし、私はそれを自分自身を行ってきたこと。
あなたは前にあなたを使用しましたか。
私は私を助けるための Gurdeep が要求されま
す。
um 意味ないはい、私です
私はそれをやったことがな
い自分はユーザー、ええと
私は手伝って深く要求され
ます前に
Segmentation
Punctuation
and True Casing
Yes.
But I’ve never done it myself.
Did you use yours before?
I will ask Gurdeep to help me.
Speech
Recognition
Speech
Correction
Translation
Text
to
Speech
Disfluency Removal
no i mean yes but I am i've never done it myself did you use
yours before uh I will ask gurdeep to help me
Raw ASR
Output
um no i mean yes but i am i've never done it myself did users
before uh I will ask go deep to help me
Customization
and Personalization
um no i mean yes but i am i've never done it myself did users
before uh I will ask gurdeep to help me
Lattice
Rescoring
um no i mean yes but i am i've never done it myself did you
use yours before uh I will ask gurdeep to help me
38. Disfluencies in Conversational Speech
um no i mean yes but you know i am i've
never done it myself have you done that uh yes
Disfluency types:
• Filler Pauses
• Discourse Markers
• Repetition
• Corrections (“speech repairs”)
um no i mean yes but you know i am i've
never done it myself have you done that uh yes
39. um no i mean yes but i am i've never done it myself have you done that uh yes
um no
i mean yes
but i am
i've never done it myself
have you done that
uh yes
no ,, yes , but , i am , i've never done it myself have you done that yes
No.
I mean yes.
I am.
i've never done it myself.
have you done that?
Yes?
No,,yes
but , i am , i've never done it myself
have you done that
yes
Yes.
I've never done it myself .
Have you done that?
Yes?
Segmentation and Disfluency removal interact with each other
Segmentation
Disfluency removal
Simple disfluency removal
Segmentation
Complex disfluency removal
40. CRF-based Classifiers for annotation
P(y|F) =
1
Z(W,F) 𝑘 𝜆 𝑘 𝐺(𝑦, 𝐹)
Simple
Disfluency
Segmentation
and Punctuation
Complex
Disfluency
Segmentation and Disfluency Removal for Conversational Speech Translation
Hany Hassan, Lee Schwartz, Dilek Hakkani-Tur, and Gokhan Tur INTERSPEECH 2014
41. Sentence Unit Boundary Detection
•CRF Classifier: L2 Regularization, Features Cut-off=2
•Lexical Features
•Brown Clusters Group semantically related words based on context
•POS tags trained on conversational data (another CRF classifier)
•Speech Pause-based duration
•Phrase-translation table n-gram
•Features on a window of two words on each side
42. Disfluency Removal, Punctuation insertion
and TrueCaser
•CRF Classifiers: L2 Regularization, Features Cut-off=2
•Lexical Features
•Brown Clusters
•POS tags trained on conversational data (another CRF classifier)
•Features on a window of two words on each side
43. Example of Complex Disfluency Removal
but , i’m , I’ve never done that before.
49. Cmdline parameters (ex of API usage)
Usage:
CmdLineSpeechTranslate.exe ClientId ClientSecret FilePath SrcLanguage TargetLanguage
Example:
CmdLineSpeechTranslate.exe ClientId ClientSecret helloworld.wav en-us es-es
Source: 1 of 8 spoken languages
Target: 1 of 50+ spoken languages
50. S2S in the Schools
Bilingual Mystery Skype
Deaf/Hard of Hearing Students
53. Deaf and Hard of Hearing Students
• In Seattle Public Schools, Jean Rogers’ (Chief Audiologist) and Liz Hayden’s
(Teacher of the Deaf) idea:
• Use Skype Translator with the “mainstreamed” deaf and hard of hearing kids
As we all know the idea of being able to speak naturally with someone who doesn’t understand your language has been a long-held dream. Whether we’re talking about the biblical story of the tower of babel, or 20th century sci-fi such as the Star Trek universal communicator or Douglas Adam’s “babelfish”
About this time last year, we set ourselves a goal of trying to turn this age-old dream into reality. We realized that there was a confluence of factors that taken together gave us the opportunity to make this happen. We ourselves in the MT field have been making steady progress on MT quality both with better algorithms and by applying ever greater amounts of data, such that our best MT systems today are really quite good. At the same time, the ASR field has seen a technological leap over the last few years with the use of DNNs leading to dramatically lower errors rates. And finally, we at Microsoft Research, felt we had a golden opportunity not just to do a great technology demo, but to actually put this in the hands of 100s of Ms of users through Skype.
In order to achieve this we faced a number of challenges, which is what I will be talking about for the next hour.
On one hand: Help Skype users
On the other hand: have a generalized speech translation system and service for 3rd parties.
There are three main challenges here that we need to go through to have a realistic S2S system :
The gulf between speech and text
It’s not enough to just chain a really good ASR system with a really good MT system
How people talk to each other is not how they write
Building really good conversational ASR and MT systems
Significant changes in the data we use to train the ASR and MT systems.
The gap between technology demo and consumer product
Plugging into Skype
Interesting problems one encounters with real consumers
[READ SLIDE] So, if we take the raw ASR output and just throw it at MT, it doesn’t work so well. We need components to process the ASR, remove disfluencies, etc and make it more palatable to MT. Likewise we need to adapt MT to handle this kind of input
So let’s take a closer look at the different types of disfluencies – first you have uh, your, um, fillers, then, you know, I mean, your discourse markers and and and repetition, and finally correct--, I mean, speech repairs, where people go back and repeat, I mean, change what they said.
In this example, here the speaker changed no to yes, and “I am” to “I have”.
So let’s take a closer look at the different types of disfluencies – first you have uh, your, um, fillers, then, you know, I mean, your discourse markers and and and repetition, and finally correct--, I mean, speech repairs, where people go back and repeat, I mean, change what they said.
In this example, here the speaker changed no to yes, and “I am” to “I have”.
Another thing that is missing is punctuation -- If we don’t get punctuation right, we are risking a lot more than just word salad. You may know the example “Let’s eat grandma”, where a missing comma could lead to a tragic outcome. When translating, the problem gets worse.
In the case of Spanish (and possibly other languages), we have an additional problem with accented words [READ SLIDE]
In addition to *how* people speak (genre), there’s also a big difference in *what* the talk about (domain).
Now let’s take a closer look on Speech Correction component which helps in bridging the gap between spoken and written text.
[READ SLIDE] Adapting MT to ASR starts with building a good baseline conversationally-oriented MT system
Now let’s take a closer look on Speech Correction component which helps in bridging the gap between spoken and written text.
Speech Correction component helps in bridging the gap between spoken and written text.
Let’s listen to what the user said.
Very good fluent English.
But if we read the english transcripts, it is hard to understand it, not to mention the translation.
So, if we take the raw ASR output and just throw it at MT, it doesn’t work so well. We need components to process the ASR, remove disfluencies etc and make it more palatable to MT. Likewise we need to adapt MT to handle this kind of input
Missing sentence boundaries, punctuation, casing and disfluency removal
First we know who is talking to who, we can have user profiles for both of them that can enable us to do better job in both recognition and translation.
Personalization and customization plays crucial role in open domain S2S since people are usually talking about broadly different things.
For example here we can recognize a person name “gurdeep” rather than “go deep”
Good customization and personalization is very crucial for open domain Skype Translator, people can talk about their palnned vacation to Columbia to Spanish speaker which needs different vocabulary than talking to a Chinese supplier next day next product plans.
Open domain conversational S2S can benefit from customizing and personalizing the models according to the users profiles.
We can use users’ profile to create customized models that can fir their topics and vocabulary.
Describe the diagram above.
Currency we use this infra-structure for contact names recognition.
One of the issues with ASR is that the vocab cannot include every possible person name (or place name etc). Expanding the vocab drastically to include millions of names can compromise WER, because names may be misrecognized in place of regular words.
However in the Skype Translator, we found that when the system didn’t recognize the caller or callee’s name at the start of a call, it often derailed the entire conversation.
So we opted for a surgical fix for now while we investigate more broad-based options. What we’ve done is added a very small restrictive grammar comprised of common greetings etc, but with placeholders for names. At the start of a call we dynamically compile the contact names for the current caller and callee into this grammar, and our ASR engine can use this grammar in parallel with it’s broad based regular LM.
Instead of using one best output from ASR, we can use n-best in lattice format.
Lattice rescoring can help us in getting many possible alternatives form SR and then use very large LM to score them
In the early days we were fixated on WER and it’s effect on BLEU and the error cascade, meaning that if we were to pipeline multiple error-prone components together, the errors would multiply. This is a well-studied problem that other researchers have studied for a number of years.
One approach that’s been tried is to take the ASR lattice directly as input to the MT decoder. This has been studied by many groups and is conceptually elegant, but the implementation is quite complex. A lattice representation allows an MT system to arbitrate between multiple ambiguous hypotheses from upstream processing so that the best translation can be produced.
A simplification is decoding over a confusion network, where the ASR confusables are compactly encoded as a “word sausage”. This is very easy to decode in MT because it affects mostly just the phrase-lookup portion, leaving the rest of the decoding untouched except for some extra features. We found that the MT portion of this worked well. However collapsing an ASR-lattice into a confusion network is an ill-defined operation which can result in some nasty artifacts in the confusion network such as a proliferation of epsilon arcs.
After some experimentation with N-best rescoring and confusion networks, we decided to try a couple of different things.
Confusion network: A Confusion Network (CN), also known as a sausage, is a weighted directed graph with the peculiarity that each path from the start node to the end node goes through all the other nodes.
Speech Lattice: Maintains a set of candidates as a subset of models.
We decided that before we plunged into full-fledged MT decoding over lattices, we would first try simply rescoring the ASR lattice with a much bigger LM, adding some extra MT-friendly features and tuning model weights.
We discovered we could get very good ASR and end-to-end BLEU gains, at which point we decided not to bother with decoding over lattices in the MT decoder itself
Now we have “almost” perfect transcription that match what is the user said, improved and customized as well. Are we ready to send this to MT.
No yet. Disfleuncy handling should be done before we translate. But actually disfluence removal and punctuation need to be done concurrently.
Finally, we come into segmentation into sentence units, punctuation and casing which should be ready for a state-of-the art MT system to produce reliable translation.
So let’s take a closer look at the different types of disfluencies – first you have uh, your, um, fillers, then, you know, I mean, your discourse markers and and and repetition, and finally correct--, I mean, speech repairs, where people go back and repeat, I mean, change what they said.
In this example, here the speaker changed no to yes, and “I am” to “I have”.
Traditionally this problem has been solved by first doing segmentation and then disfluency removal
But there is an interaction between segmentation and disfluency handling.
If segmentation is done first, disfluency will lose a chance to make a better correction, and we’ll be left with numerous disfluent fragments.
On the other hand, complex disfluency removal (speech repairs) needs sentence boundaries, so you can’t do that first either.
What we did is split the difference. We do some simple disfluency is first, then segmentation, then complex disfluency removal.
Conditional random fields (CRFs) are a class of statistical modelling method often applied in pattern recognition and machine learning, where they are used for structured prediction. Whereas an ordinary classifier predicts a label for a single sample without regard to "neighboring" samples, a CRF can take context into account; e.g., the linear chain CRF popular in natural language processing predicts sequences of labels for sequences of input samples.
First remove simple disfluencies, then segment, then remove complex disfluencies
First two stages are CRF sequence taggers
Complex disfluency handling:
Uses metadata annotated by previous stages
Using iterative parsing (NLPwin parser)
Needs sentence units
Brown clustering is a hard hierarchical agglomerative clustering problem based on distributional information. It is typically applied to text, grouping words into clusters that are assumed to be semantically related by virtue of their having been embedded in similar contexts.
In complex disfluency removal we take advantage of the NLPwin parser that was built for the MS Word grammar checker and so is robust to ungrammatical input. For example here we have a repeated subtree that is linguistically similar, and so the first subtree is removed. We also look for are constituents that appear to be disconnected from other parts of the tree. When we spot a disfluency we remove it and reparse the resulting sentence because the removal of the disfluency could change the entire parse. We remove errors one by one and stop when we have no more edits or we hit a limit on the number of parses.
Lots of numbers here. Looking at the last column which is what we care about, here are the takeaways
Sentence breaking based on speaker pauses is a bad idea (lose 0.5 BLEU points)
CRF sentence breaking by itself adds about 0.5 BLEU (vs no breaking at all i.e. translate the full utterance)
Disfluency removal by itself adds about 1.3 BLEU
Doing both gives you about 2 BLEU points and doing the split before/after adds another 0.5
This is Vinny, who participated in the Mystery Skype session we had with the schools in Beijing. Vinny’s deaf, so it was wonderful for him to participate in these calls with his classmates. Even when he was unable to hear the response back from the students, he could read the translation of what they were saying.
Although the use here demonstrates the use of the technology with deaf or hard of hearing students, it’s not much of a stretch to adapt the technology, since the components already exist, to hearing students that speak other languages. In fact, it could be used in that manner now. We haven’t tested it in this scenario…yet.