A Computer Journal For Translation Professionals
(the three hundred twenty-third edition)
One thing I noticed about this past year of pandemic has been the
increased importance of weather forecasts. We had a very small group of
friends that we've regularly met with, but only around the fire in our
backyard or at the beach. So it became very important to rely on hourly
weather forecast apps like weather.com to see whether it was reasonable
to expect an hour or two of dry weather between rainfalls (Q: Rainfalls?
A: Oregon coast!). Overall, I've been incredibly impressed with the
accuracy of the forecasts, and for months I've been planning to once
again write something about meteorologists' use of data processing and
artificial intelligence in comparison with translators.
But then last week happened. To celebrate our 30th wedding
anniversary (👏👏 Thank you! 🙇♀ Thank you! 🙇♂), my wife and I
planned to take a trip up the coast to places we hadn't visited before.
We were really excited about our upcoming break but less excited about
the weather forecast, which ominously predicted 100% rain throughout the
week (again, Oregon coast!). We each packed several stacks of books,
expecting to spend most of our time indoors, but I didn't even finish
the first book I started because the weather was glorious.
Clearly my little anecdote contributes diddley-squat to a
reasonable discussion about the state of meteorology, but doesn't it
remind you of the state of modern machine translation? In my opinion,
it's not completely arbitrary to compare the two.
Back in 2012 (in edition 214), I mentioned the similarities between
how meteorologists and translators can rely on computer-generated data
to form the basis of their work products, which in both cases have to be
refined by humans. I cited an article by well-known statistician Nate
Silver talking about "literally countless other areas in which weather
models fail in more subtle ways and rely on human correction." If that
doesn't sound like machine translation, I don't know what could.
Interestingly, just like MT, the computer systems and algorithms
used by meteorologists today are complemented by narrow AI. While it was
only a massive IBM super computer that crunched the many bits of data
to come to reasonable suggestions (ever wonder why The Weather
Company/weather.com was purchased by IBM in 2016?), those results are
now complemented by AI algorithms developed by companies such as Microsoft
-- and of course actual meteorologists. And together they produce amazing results -- until they don't.
Again, there are myriad areas where the parallels to translation
don't quite line up, but there are also countless where they do. The
most pertinent seems to be the complexity of both language and weather.
Although humans have less complete insight into weather than language, computers ultimately have less in both.
And Now: The Weather
50 Ways to Rekindle Your Love Affair with Language
Using the UN Digital Repository for interpreting practice (Column by Josh Goldsmith and Alex Drechsel)
The Last Word on the Tool Box Journal
50 Ways to Rekindle Your Love Affair with Language
Years and years ago I started to collect characters of many
different writing systems from around the world along with little
vignettes about them and always them announced in the Tool Box Journal once
I published these on my website. While this didn't contribute to world
peace or cure cancer, it was a little garden of respite. Like a park in
the spring where I could admire beautiful flowers and blooming trees, or
a gallery where the different pieces of art transported me to places I
had never been before. Some of you accompanied me on that journey.
When Renato Beninatto become co-owner of Multilingual and contacted
me late last summer with the idea of publishing a book, it seemed like a
wonderful opportunity to publish my little collection as a beautifully
I'm beyond thrilled to tell you that the book has now indeed been
published, and it's gorgeous (I won't even apologize for speaking so
grandly about it). It's the kind of book that you can use to remind
yourself why you love languages after a long day of translating; the
kind of book you can give to your family, friends, and (maybe most
importantly!) your clients so they will finally understand what makes
language so ingenious and extraordinary; and the kind of book you can
have on your coffee table to proudly show visitors, well knowing that
they will not be disappointed.
(Okay, that's enough.)
You can purchase it on Amazon
or (if you want a signed copy) on my website
. Another fantastic idea would be to give it to all of your clients and vendors -- you can make bulk purchases on Multilingual's website
and even have it customized with your own introduction or logo. However you choose to acquire it, I hope you'll treasure it.
Here is the introduction to Characters with Character -- 50 Ways to Rekindle Your Love Affair with Language:
Most societies and language
groups across the world developed an urge to record their
communication and thoughts. Since you're reading this, you are certainly
in this particular class of people, and since I'm writing this, I'm
with you. But another thing that probably unites us is the curiosity
with which we regard those cultures that didn't. A surprising 3,000 of
the 7,000 existing languages never developed a writing system. Why? In
many cases, their oral traditions were so sophisticated that there was
never a need to record anything except in their minds. In addition,
speakers of those languages probably didn't have any territorial
ambitions beyond their general vicinity.
Still, more than half of all
language groups have developed writing systems, and their speakers have
dug deep into their imaginations to develop systems that would make them
writers as well as readers.
I have long been more fascinated
with written languages than spoken languages for a very clear reason:
I'm a translator, so I work with the written language, not an
interpreter who works with spoken language. This statement also entails a
confession of sorts: I'm not really an expert in type setting or fonts
(though I love fonts and talk about them quite a bit in this
collection). Instead, I am more of a fan, a connoisseur, an art lover.
Though he is not a true expert, the sports fan possesses a plethora of
knowledge about the players and their plays, the connoisseur knows what
she likes, and the art lover could spend every afternoon in a museum. In
much the same way, I know when a certain character moves me. I
immediately sense when a glyph (the actual representation of that
character) delights me. And I'm simply amazed that one global system of
encoding more than 110,000 characters (Unicode) can actually reproduce
those characters on a computer screen and print them in a book. I think
that's what we commonly call passion, and I have plenty to spare when it
comes to the topic of this book. I have tried to intermingle my
passionate descriptions with little tidbits of information that might
help you sound a bit more erudite during small talk at your next
cocktail party or farmer's market, but don't read this primarily as an
informational book. In these pages you'll find no table of contents,
page numbers, or index. Instead, stroll through at your leisure, much
like you'd walk through an art gallery or meander a pathway in a
well-maintained city park.
A sample page spread from Characters with Character
Save up to 35%
on SDL Trados Studio 2021
this March! BUY NOW →
Join the world’s largest community of freelance translators.
The Tech-Savvy Interpreter 2.0 - Using the UN Digital Repository for interpreting practice (Column by Josh Goldsmith and Alex Drechsel)
If you’re an interpreter adding a new language or exploring a new subject, speech repositories can be real gold mines.
what if you’re looking for thousands of hours of authentic high-level
speeches and recordings of the interpretation as delivered? Look no
further than the UN Digital Repository, a little-known treasure trove.
The UN Digital Recordings Portal
When most interpreters search for speeches delivered at the United Nations, they stumble across UN Web TV.
That platform includes on-demand and live video in all UN languages
(Arabic, Chinese, English, French, Spanish and Russian). But there’s a
catch: most videos only include the original language and the English
interpretation. Plus, the platform doesn’t allow you to download videos,
which makes practice a bit trickier.
recordings of speeches and simultaneous interpreting at meetings are
uploaded every single day. You can search the materials by date,
organization or committee, and keyword. The repository also includes the
speaker’s name and time markers, which makes it easier to find practice
materials in a given language.
an example. Search for a recent session of the Human Rights Council.
Want to practice, say, Arabic? Scroll down to find an Arabic-speaking
country, then click play to hear the speech in the original language or
“Download current language” to download it. Select one of the other
languages to play or download the interpretation.
recommend downloading several speeches before you start practicing.
(Pro tip: Longer speeches tend to be delivered at a slightly slower
pace. 🐌 )
Simple tech for recording and listening
Now it’s time to set up the tech. We recommend Audacity, a free multi-track audio editor which runs on Windows, Mac, and Linux.
select your microphone and headphones, import the speech, then click
the record button and start interpreting. You can adjust the volume
using the +/- slider or pan the audio to one ear using the L/R slider.
you’ve finished interpreting the speech, stop the recording and take a
breather. Congratulations - you’ve made it through a UN speech! 👏
Next, mute the original track and listen to your own recording.
to your interpretation and the original at the same time is also a
piece of cake. Pan one track all the way to the left ear and one to the
right ear, then hit play. Of course, you can pause, stop, or rewind the
recording to listen to a tricky passage again or take some notes.
It can be helpful to listen to the original, then your interpretation, then both tracks simultaneously.
Learn from the pros
Bonus: the Digital Recordings Portal includes the simultaneous interpretation of every speech.
comparing yourself to the professionals, start by listening to their
work. Try to identify strategies they use and jot down useful
expressions, terminology and turns of phrase.
you’ve started to gain an understanding of UN jargon, it may be helpful
to listen to the original speech and professional interpretation, again
panning one recording to each ear.
finally, why not listen to your own rendition in one ear and the
professional rendition in the other? You’ll quickly identify plenty of
ways to upgrade your own vocabulary and improve your technique.
Practice makes perfect
speed and caliber of UN speeches can seem overwhelming at first. Don’t
be too hard on yourself! Remember that staff interpreters have been
working for years and often receive the texts in advance.
used the approach we’ve described here to train for the UN’s
accreditation exam - and didn’t pass the first time around. 😊 But he
kept practicing and eventually gained a solid enough grasp of UN
speeches to pass the Language Competitive Examination.
what are you waiting for? Head on over to the Digital Recordings Portal
and find a speech you’d like to practice. As you do so, listen
critically to your work, learn from the pros, and keep improving every
Curious how to study for the UN accreditation exam using Audacity and the Digital Recordings Portal?
I know some of you might not be enthusiastic about me writing again
about data privacy when using generic MT engines like Google,
Microsoft, and DeepL. This will be partly because I've done so a number
of times already, and I think many might be using the data privacy issue
as a kind of marketing ploy that is just too good to let go of -- even
though it's not exactly truthful (more on that below).
Now, I'm under no illusion that whatever I write here or elsewhere
holds more weight than whatever someone else might write, but truthfully
I want to make really sure that I myself understand the admittedly very
important data privacy issues, so I'm just taking you (once again) on
that journey with me.
The question is this: Is my clients' data privacy assured when I as
their translator use services like Google Translate, Microsoft
Translator (or whatever it might be called at this particular point in
time), or DeepL?
Let's start with times when it's not safe or ethically defensible.
(Note that I'm not going to talk about the use of machine translation in
general, just about whether it's safe to trust Google, Microsoft, or
DeepL to use the data you transmit to them only for the purpose of
suggesting a machine translation to you and for nothing else.)
First, it's not ethically defensible if your client expressly
prohibits it. That's it as far as that point is concerned. It might be
that the client is ill-informed about why they prohibit this, but that's
clearly not your concern. If they say don't do it, you don't do it.
Second, it's not safe to use any of those services if you use their
web interface at translate.google.com, bing.com/translator,
deepl.com/translator, or through apps of any of those companies that
offer the machine translations for free (exception: Microsoft Office
products -- see below). These companies expressly say that they very
well might use your data to improve their services.
- Here is what Google says:
"We also collect the content you create, upload, or receive from others
when using our services (…) And we use your information to make
improvements to our services -- for example, understanding which search
terms are most frequently misspelled helps us improve spell-check
features used across our services." While this does not specifically
pinpoint translation services, it is my understanding that they are
included (as well as Gmail and myriad other Google services). If you
have been using the web interface for Google Translate while logged into
Google, you can select the History icon at the bottom of the page to
see what Google has actually stored in the last three or so months.
- Here is what Microsoft says:
"Microsoft Translator processes the text, image, and voice data you
submit, as well as device and usage data. We use this data to provide
Microsoft Translator, personalize your experiences, and improve our
products and services."
- And here is what DeepL says:
"When using our translation service, please only enter texts that you
wish to transfer to our servers. This is necessary in order for us to
produce the translation and offer you our service. The transfer of these
texts is necessary for us to carry out the translation and offer you
our service. We process your texts and the translation for a limited
period of time to train and improve our neural networks and translation
algorithms. If you make corrections to our proposed translations, these
corrections are also forwarded to our servers to verify the accuracy of
the corrections and, if necessary, to update the translated text to
reflect your changes. We also store your corrections for a limited
period of time to train and improve our translation algorithm."
So far so good.
Yes, I think that this is good for us because it differentiates the
casual user of machine translation from those of us who use machine
translation as one of our resources during professional translation.
Because what we (should!) do is access machine translation from those
sources via their API (application programming interface -- how
different programs exchange information). And if we access it within a
translation environment (Trados, memoQ, Memsource -- you name it), that
is exactly what we're doing.
Here is what the different systems say about that:
Google: "Google does not use any of your content for any purpose except to provide you with the Cloud Translation API service."
"Azure Cognitive Services Translator is a cloud-based machine
translation service and is part of the Azure Cognitive Services family
of cognitive APIs for building intelligent apps. Customer data submitted
for translation to Azure Cognitive Services Translator (both standard
and custom models), Speech service, the Microsoft Translator Speech API,
and the text translation features in Microsoft Office products are not
written to persistent storage. There will be no record of the submitted
text or voice, or any portion thereof, in any Microsoft data center. The
audio and text will not be used for training purposes either."
"When using DeepL Pro, the texts or documents you submit will not be
permanently stored and will only be kept temporarily, to the extent
necessary for the production and transmission of the translation. Once
you have received the translation, all submitted texts or documents and
their translations will be deleted. When using DeepL Pro, your texts
will not be used to improve the quality of our services."
It seems relatively clear to me but a) I'm not a lawyer and b) all
too often fellow translators or other technology providers like to throw
shade on those provisions by pointing to other sections in the legal
thickets of those companies that might read like loopholes to those
conditions. If the skepticism arises out of real doubt about whether
that data might be treated differently than outlined in the legal
statements above, it's not only justified but laudable. But in other
cases, I seem to notice a stubbornness borne either of wanting to sell a
product or service that in some way competes with those generic MT
offerings (a sales pitch masquerading as moral high-ground), or just a
general rejection of MT in all its forms (or any combination of the
two). I think we have to be careful about taking stands that might be
hard to defend, especially when it comes to the core of our business as
translators or translation technology providers.
(Plus, it has always seemed kind of preposterous to assume that
professional translators have so much to add to the ongoing collection
of data -- remember, we are only talking about source data here, unless
you would be using the tools' interfaces to make corrections to the
translation data -- that it would even make a dent in the billions of
times non-API users access the data and enter text, and that these
companies would embarrass themselves by not keeping what clearly seems
to be a contractual promise.)
Either way, I thought it would be helpful to actually reach out to
some people with these organizations to see what they actually know
about their company's plan for data submitted through their APIs. I
didn't even try with DeepL since I know that they are terrible at
communication. I did contact someone at Google who essentially confirmed
the contractual agreement, though he was very eager not to go on the
record with anything that could get him into hot water with Google's
legal team (I remember when interviewing the former head of Google's MT
in Mountain View years ago, two members of the legal team sat right next
to him and weighed every word that came out of his mouth). But I was
very grateful to Microsoft's Chris Wendt -- or rather former Microsoft
employee Chris Wendt, who happened to retire just days after I asked him
Here is what he said:
"When using the Translator API, free or paid, or a commercial
application like Office, no customer content will be stored by
Microsoft. When using a Microsoft consumer app, the Microsoft Translator
app for the phone or bing.com/translator, Microsoft may save the
customer content and use it for quality improvement. We recently changed
the phone app to specifically ask for permission before storing
"There is a difference between customer personal data and customer
content. Customer content is the payload of the translation request.
Customer personal data identifies the customer, like the subscription
ID, email address, physical address, IP the request came from, and
similar. The services, including Microsoft, do maintain personal data in
order to send the bill, ensure fairness, and throttle the service.
That's why the explanation of what happens with personal data is
somewhat lengthy. What I say above is about customer content (payload).
Not about the metadata associated with the use of the service."
And, just for clarification, I asked again "Using the paid API
services to obtain translation from Microsoft (with or without Custom
Translator), there is no case where the source data will be used by
Microsoft (…). Is that correct?"
And Chris's answer:
"That's correct. Not the translation either."
And all of the above is by no means me arguing that you or anyone
should use machine translation. I have no dog in that fight (it's really
not a fight in the first place), but I think it's really important to
be clear about the legal ramifications. Most of the articles that are
written about MT are about customized MT systems. It is possible to use
customized systems -- either provided by clients or through systems like
the ones above that we ourselves can train. Fact is, though, that the
vast majority of translators do not have access to customized systems
(either because the clients don't provide them or because translators
work in too many different fields and sub-fields to spend time training
engines) so it is these kinds of systems that many are using. And it's
good to know exactly what that means.
Here's an interesting feature in relatively recent versions of Windows 10 that I just discovered: Storage Sense.
Since my workhorse laptop likely has a year or two left in it, I
decided to ride it all the way into the sunset (use of metaphors: A+). A
year or so ago it needed some major repairs, during which most of its
keyboard went dead and the newly installed hard drive went from
relatively small to really small. (Don't ask why! Just remember: I live
in the middle of nowhere with the corresponding computer repairmen. . . .
) Still: It's got a decent processor in it, and its successor is
already waiting at home. And since I feel slightly abashed at all the
resources that go into building each computer, I feel like I should at
least squeeze as much out of it as I can.
The pitiful storage on its hard drive and my fear that it might
give out before the allotted "year or two" have motivated me to switch
virtually all of my file storage to Microsoft's OneDrive. Though this comes as part of the Windows package with only a few GB's, it gets stocked up to a terabyte+ with a Windows 365 subscription (notice the recent name change from "Office 365").
I have been working like this for about a year now and really like it.
It allows me to access my files from any computer, it makes it easy to
share a folder or file with third parties with ease, and it allows me to
continue working on my ridiculous laptop. I know there are a number of
other similar solutions (Dropbox,
etc.) out there, but a) I would have to pay for them and b) in my
humble opinion they have an overly negative impact on the way Windows
runs. (It truly is a humble opinion because I'm sure many of you have
much more expertise and plausible reasons to know otherwise. But,
please, please, don't take this away from me since this would nullify my
pet excuse for not being able to help my wife with her Dropbox-centric computer: "Honey, the only reason your computer is so sluggish is because of Dropbox and I don't know how to fix that!")
OneDrive works by storing your
files in the (Microsoft) cloud and making a local copy as soon as you
open a file. This allows you to continue working on that file even
without an internet connection (I did mention that I live in the boonies
where power and internet access fails relatively regularly, right?).
The problem is that these are never deleted once opened and saved on
your hard drive, which eventually defeats the purpose of storing them in
the cloud if your goal was to offload them from your local storage in
the first place.
This is (finally!) where Storage Sense comes in. To activate it and access its features, open your computer's Settings (quickest way: WinKey+i), select System and then Storage. Somewhere on that page you'll see the Storage Sense settings (in its short lifespan, this has changed a number of times; in the latest version of Windows it's right at the top) along with an activation toggle. Once you select (Storage Sense) Settings,
you can see the various options (how often you want to purge files and
which files you want to automatically delete -- be wise in your
selections!). At the very bottom you'll find the setting for OneDrive files
-- the one I really appreciate. In my mind there is no reason for a
file to be in my computer for more than a few days after I use it. I
know it's better protected and safer in Microsoft's cloud, and taking if off my hands makes my little
engine laptop that could chug on just a little longer.
The Last Word on the Tool Box Journal
If you would like to promote this electronic journal by placing a
link on your website, I will in turn mention your website in a future
edition of the Tool Box Journal. Just paste the code you find here
into the HTML code of your webpage, and the little icon that is
displayed on that page with a link to my website will be displayed.
If you are subscribed to this journal with more than one email
address, it would be great if you could unsubscribe redundant addresses
through the links Constant Contact offers below.
If you are interested in reprinting one of the articles in this
journal for promotional purposes, please contact me for information
© 2021 International Writers' Group