A Computer Journal For Translation Professionals
(the three hundred eighth edition)
We've all heard the legend of the many words for "snow" in
Inuktitut or other Arctic languages. It's the favorite tale of
oh-so-many linguists (either "Yes, there are amazingly many words," or
"No, it's all overblown and there really are just a handful"). Even Kate
Bush (and Stephen Fry) addresses it in the beautifully haunting 50 Words for Snow
. I've been reading the recently released Words of the Inuit: A Semantic Stroll through a Northern Culture
by Louis-Jacques Dorais (which I happen to think is a rather successful
example of how you can successfully address both a scholarly and a
general audience in the same publication), and I think many of you will
find this excerpt (from p. 23f.) interesting.
"The 'one hundred Eskimo words for snow' is a well-known although
often misunderstood example of how a language enables its speakers to
make very fine distinctions among sets of semantic categories that
characterize important cultural elements. For some unknown reason,
people who cite this example often assert that the allegedly large
number of Inuit words for snow ends in a two, variously claiming
thirty-two, fifty-two, or even 102 different words for defining various
types of snow.
"Actually, in Nunavik Inuktitut -- and the situation is the same in
other dialects -- I know of only seven words whose unique function is
to denote a particular form of snow (qanik -- falling snow; masak -- wet falling snow; aputi(k) -- snow on the ground: pukka -- crystalline snow on the ground; aniu -- snow for making water; mannguq -- melting snow; sirmiq -- melting snow used as cement for the snow house).
"However, Inuit are able to distinguish between at least
twenty-five -- and probably many more -- different snow conditions
expressed by way of dedicated terms or through semantically more
encompassing words whose meaning denotes a certain type of snow when
used in specific contexts: illusaq -- material for a house, i.e., snow fit for making a snow house; maujaq -- soft ground, i.e., soft snow on the ground; kinirtaq -- something compact, i.e., damp, compact snow; aqilluqaaq -- very tender material (e.g., cooked meat), i.e., drift of soft snow; piirsituq -- it carries things away, i.e., there is a blizzard of snow.
"Ice is as important as snow to Inuit, and their language allows
them to make a very large number of semantic distinctions between
different types of frozen water. A dictionary of terms related to sea
ice in the Inupiatun dialect of Wales, Alaska, includes more than 110
entries. Ironically, even if there probably exist more semantic
distinctions for ice than for snow, only the alleged 'one hundred words
for snow' are commonly heard about, possibly because in Qallunaat [=white European person] minds, Inuit are primarily identified with snow (e.g., snow houses) rather than ice."
A lot to learn here. Like our preconceptions of the "other" or the
desire to see something "romantic" rather than useful (such as "charming
words for snow" rather than "words for how snow can be used").
The concept of learning from other languages lies at the very heart of the Translation Insights & Perspectives
tool. I just recently "discovered" that in Enlhet, a language spoken in
Paraguay, "peace and security" is translated as "no news" (for when all
is well there is "no news" -- see here
It almost made me choke up. Isn't this what we all need right now?
(Next week's US elections come to mind, along with many other
Learning From Others
New Pricing Model
Inter-Language Vector Space
Our favourite question: Will technology replace translators and interpreters? (Column by Josh Goldsmith and Alex Drechsel)
Hey Translator, crossMarket Is Closing!
This 'n' That
The Last Word on the Tool Box Journal
Translating PDFs is easier and quicker with TransPDF
- Supports Arabic, Hebrew, Persian and Urdu PDF input and output
- Integrated into Memsource, memoQ, ONTRAM
- Compatible with all CAT tools.
- Fast log-in for Proz members.
Try it FREE
for your next PDF project.
One talk I listened to at last week's -- very successful! --
virtual ATA conference was by my good friend Jay Marciano. Jay has been
working with machine translation from the very early beginnings (well,
maybe not the very early beginnings in the late 1950s -- though he sure
looks like it!) and has worked for a number of machine translation
providers and service providers that use machine translation. For the
last few years he has been the most consistent voice from within the
machine translation community to reach out to translators, especially in
the US, to inform us of machine translation's current and future impact
on the industry from his particular vantage point.
This time, however, he gave a different kind of talk in which he
suggested a new kind of pricing model. I'm probably not going to do it
complete justice in this short article, but let me try to rephrase his
main points. He argued that since there is no discernible and reliable
differentiator between fuzzy translation memory matches and machine
translation suggestions (assuming that you are using a specifically
trained machine translation engine for a text type that is suited for
that kind of translation), we should stop differentiating them in their
pricing. Instead, they should all be paid by edit distance. ("Edit
distance" is the now widely used approach to evaluating the number of
changes the editor or translator had to make to an MT suggestion before
delivering it.) Doing this, according to Jay, will protect the
translator from poor-quality machine translation (because the edit
distance -- or rewrite from scratch --will in that case be large enough
for 100% payment) as well as from bad translation memories (same
reason). Also, he suggests payment for MT suggestions with no edit
distance, i.e., suggestions where no edits were deemed necessary (20% of
the word price) at a rate twice as high as a 100% TM match (10%) to
compensate for the effort to evaluate their accuracy. He also suggests a
110% rate for an edit distance of 91-100%, taking into account the
larger effort needed to "correct" something that was rather useless in
the first place.
If this is not clear, I would be glad to ask Jay to write a little
something about it in the next edition of the Tool Box Journal or maybe
transcribe a discussion with him about that.
My immediate reaction to this proposal is that it seems to be an
attempt at a fair -- but hard-to-predict -- compensation scheme. On the
other hand, I think we should still be looking for an hourly-based
compensation. I have not yet encountered discussions on how a mechanic,
electrician, accountant, or lawyer takes too long to do a job as a
reason not to hire them, and I see no reason why we should be different.
What I could see, however, is that we would use something like this
system as a foundation of sorts to come to agreements about hourly
compensation when negotiating an hourly rate with clients for whom we
work TM+MT-based jobs.
Trick or treat? We choose to treat you with awesome software for
translators! Leap ahead of your competitors with smart project
management and word count.
Surprise for Tool Box Journal readers from AIT at the link below:
Stay vigilant, everyday superheroes!
Inter-Language Vector Space
As I was considering how to tackle this piece about XTM
latest feature, an off-hand comment from XTM's CEO Andrzej Zydron came
to mind: XTM has 160 people working full-time for them at this point,
with 40 of those writing code. They do serve some large clients --
including Expedia, SAP, Toyota, Twitter, and LinkedIn -- but this is
still a large operation. I realized once again how this age of big data
and narrow artificial intelligence (the kind of AI that we have access
to right now) has made it difficult for very small vendors to compete.
You can still develop a great tool with a one- or two-person outfit (for
a small and targeted [and devoted] user group -- see for instance CafeTran Espresso
), but if you want to equip it with data-driven abilities, you will struggle.
And, as we've stated many times before, data-driven processes does
not only include machine translation but also so much more -- from
user-generated data on how to make your tool more efficient, to ways to
allow your user to collect data on how to streamline processes, to
equipping your tool with linguistic knowledge that goes beyond static
spell-checking dictionaries and links to language-specific termbases,
translation memories, and machine translation engines.
XTM has recently invested in a technology that was first developed by Google in 2013 (see this
paper) and then taken up by Facebook (see this paper
from 2018 which contributed to this very recent announcement
XTM has now continued to develop one strand of this technology, calling
it "Inter-Language Vector Space." The idea is essentially this: Since
languages all have to express more or less the same reality, it should
be possible to map the overlay of these realities by how languages
express them. By looking at and comparing large amounts of monolingual
data, you should find similar "shapes" (called "vectors") occupied by
terms and words, and you might be able to conclude an overlap of use
(see this paper
from Babylon Health showing how this can be done).
XTM has done just that, with data for 157 languages.
Clearly this technology was originally thought of in connection
with machine translation, but this is not how XTM approaches it (at
least not directly). Instead, it looks at common annoying problems for
anyone who deals with translation and then tries to find answers in its
data for how to approach them.
What comes to mind when you think of common annoying problems?
Tags!! You know, those inline codes in your translation environment tool
that function as placeholders for anything from hyperlinks to
formatting to footnotes or sometimes just completely unnecessary coding.
(Also: The ones that can make it really hard to find translation memory
matches or that royally confuse a machine translation engine.)
In its latest version, XTM automatically places the tags in the
target segment once you leave the segment by looking at the data in its
massive trove of inter-language vector space. It's a great concept, and
while not foolproof, it is easy to work with and immediately fix it if
necessary because you are still right with the next segment as you
continue to work on your text.
Two other features using the data derived from the Inter-Language
Vector Space are term extraction (monolingual and bilingual) and
alignment (XTM's alignment has been using dictionary data for a long time
and now includes the ILVS data as well), both features that are heavily used by XTM's LSP customers.
Some of the next features XTM hopes to use with this (and other
technology) include auto-deciding on what workflow steps are necessary
for a given project, comparing and evaluating different MT engines,
storing post-edits so they can be automatically applied again, and
fixing of fuzzy and MT matches.
Andrzej is a very euphoric man who tends to get very excited about
his technology -- some of which pans out and makes a difference and some
not -- but I was really glad to see his excitement here because most of
these features are immediately relevant to a translator's job. I like
New Book on Translation Quality in the Age of Digital Transformation
The collective volume represents a strong link between theory and practice.
Tech-Savvy Interpreter 2.0 - Our favourite question: Will technology
replace translators and interpreters? (Column by Josh Goldsmith and Alex
outside the language industry often tell us - more or less in jest -
that our jobs are on the line, and will soon be replaced by machines,
anyway. This month’s column is a reflection on exactly that. Feel free
to forward it to your friends or family members who just won’t stop
teasing you about the rise of the machines.
We’ve all done it - even professional translators and interpreters.
What do you do when you come across an article in a language you don’t speak? Run it through Google Translate. When you need to have a quick chat over video with someone and don’t have a common language? Skype Translator
can help. Or that exotic menu written in a foreign language while on
vacation? Just open the handy translator app on your smartphone, point
it at the text, and let the magic begin.
For many basic tasks, these automatic tools work well enough. They’re certainly not perfect, but do help us grasp the gist of the message and make ourselves understood.
All of this leads us to the million-dollar question: Will translators and interpreters be out of a job soon?
don’t think so. But as machine translation improves, it is becoming
useful in many situations that go beyond the basic examples we just
is this the case? Basically, the answer boils down to two key
developments: big data and a new, better approach to machine
big data. Strip away the jargon, and you’re left with a key idea: that
software can analyze huge amounts of data and detect useful patterns far
faster and better than a human ever could. This applies to plenty of
things other than language, of course - from real time traffic reports to targeted advertising or Netflix’s recommendations for what you should watch next.
In language technology, companies like Google and DeepL
draw on huge bilingual and multilingual corpora. Their databases
include millions of sentences from original documents along with their
translations produced by humans. These treasure troves of well-translated content form the backbone of modern machine translation.
new machine translation models. Until a few years ago, the statistical
approach to machine translation was widely used. In this model,
computers identify sequences of words in the original text, look up
possible translations, and then use statistical models to decide which
translation is most likely to be right.
new kid on the block, neural machine translation, uses a different
approach: Instead of relying on probability for the right translation,
artificial intelligence recognises patterns in the source and the target
language and matches the two. These patterns go beyond single words to
entire phrases and sentences. And recent studies have shown that this method yields better translations than before.
machine translation has plenty of hurdles to overcome. Whereas human
translators understand one sentence in light of an entire text (and
more!), machine translation often operates on a sentence-by-sentence
basis. Plus, it needs huge amounts of data in the form of
human-translated text, which is not equally available for every language
pair. Machines still struggle with concepts that were not in their data
set, and cannot grasp jokes, irony, cultural references, puns, rhymes,
and the other fun stuff that makes communication so rich.
Nevertheless, technological developments are changing the face of the language industry. Many companies already save time and money by having their texts translated by a machine and polished by a professional linguist; the jury’s still out on quality and job satisfaction. Others use controlled language with a smaller subset of words and strict rules. After all, a simple text is easier for a machine to translate.
for machine interpretation, it’s even harder than machine translation
to get right. That’s because machine interpretation has traditionally
included three stages: transcribing speech, using machine translation to
convert that text into another language, and then using speech
synthesis for the spoken output. An error in the first step can be
compounded in later phases. And new machine interpreting models that leave out the machine translation “middleman” are still embryonic at best.
importantly, machine translation and interpretation are based on the
fallacy that language professionals just “translate words”. But we do
far more than that. As research shows,
interpreters draw on body language, information on screens, word lists,
and more to form meaning; we add information, explain cultural
references, advise speakers, change language registers when needed, and
detect and solve all sorts of problems.
sum, although machine translation and interpreting can facilitate basic
understanding, they still fall far short of what humans can do. So
while we encourage you to keep an eye on technological developments -
which may help to streamline the work of language professionals - you
should still talk to a professional translator or interpreter for your
next big project.
PS. Questions or ideas about interpreting technology? Drop us a line at email@example.com! We do the research, so you don’t have to.
Bogged down by busywork? Our FREE mini-guide is just what you need.
Productivity Hacks for Translators and Interpreters covers text
replacement, text-to-speech for editing, time tracking, digital
invoicing, and more!
Hey Translator, crossMarket Is Closing!
If you have ever registered with Across' crossMarket
marketplace…. No, let me be more specific: If you use the Across Translator Edition
(ATE), you are registered with the crossMarket
marketplace. That means you received an email
about a month ago announcing that crossMarket
was about to close. Through personal conversations over the last few years, I'd been aware that crossMarket
not been the success Across had hoped it would be. (To be honest, I had
thought that Across' hope might actually become a reality since more
than any other translation environment tool, Across has a more closed
infrastructure and therefore a better possibility of building and
maintaining a marketplace.)
Well, it sounds like both Across and I were wrong. As a result,
they have decided to scratch everything they've done in terms of a
marketplace and essentially start from scratch.
What does this mean?
Well, first of all, the annual fee that had to be paid to become part of crossMarket and use the ATE still has to be paid -- only without crossMarket. crossMarket itself
will be replaced by a new kind of marketplace sometime next year. I
spent some time talking with the person in charge -- Nicolas Stumpf --
about the new plans and how they differ from what was there in the first
The neo-German name of the new place is "HeyTranslator
It can already be found online but so far is not functional. What you
can find there is a little impression of the look and feel, plus you can
register yourself to get further updates.
Nicolas is not an industry insider, which could be either good or
bad. However, it seems positive to me that he certainly is very eager to
learn and listen. After Nicolas looked at other available web-based
platforms to connect translators and clients, he became convinced that
the time is right for another kind of marketplace that reflects both a
modern design and a 2020's approach. According to him, ProZ is
stuck where it was about 15 or so years ago, so the most logical place
for uneducated clients to look for freelance translators would be
platforms like Upwork or Fiverr --
sites that certainly have no experience to offer when it comes to
translation. And up to this point, I tend to agree with him.
With HeyTranslator, Nicolas and his team (made up mostly of the same people who worked on crossMarket)
are trying to build a translation tool-agnostic (we'll see how that
goes), transparent place for clients to meet translators and vice versa.
While it will be possible for LSPs to register, they are not the target
group, and translators are encouraged to create comprehensive profiles
(which will not be migrated from crossMarket),
including a price range for services (rather than one fixed price). Two
things that make this proposition a little different are that a) not
only can clients rate translators but translators are also encouraged to
rate the clients; and b) payment is done via an escrow account that is
paid into by the client before the job starts and is paid out of
immediately after the job delivery is confirmed by the client.
The fee that HeyTranslator envisions
for each transaction is between 6 and 16%, depending on the size of the
job. I feel that the upper end of that range is too high since there
really is no project management offered by HeyTranslator, plus at this point they also don't offer any workflow process or document management -- but this may change at a later point.
We might have to wait a few months before we will see the first
functioning version of the site go live, and this time I will keep my
mouth shut about whether I think it is going to be successful.
You and I know that virtually any and every tool vendor has tried
some kind of marketplace at some point, and so far all of them have more
or less failed. I'm not sure why this one should work, though I do
agree with a need for some kind of online platform for inexperienced
clients to look for qualified translators. One way to help clients find
the value they're looking for would be by emphasizing translators'
degrees, accreditations, and certifications, something that Nicolas has
promised to do in the tool.
Join the SDL Trados Live Virtual Conference Series
FREE presentations from over 40 industry speakers in multiple languages
Three years ago I wrote about the English terminology extraction tool Prospector
(see edition 280 of the Tool Box Journal
Serge Gladkoff, the CEO of the tool's maker Logrus Global, contacted me
again to show me some of the latest developments of the tool. Feel free
to read what I wrote about it previously and then add the fact that
it's now completely cloud-based, it now also supports text and HTML
files for extraction purposes, and it has a feature that allows you to
select your existing termbases to automatically exclude already existing
entries from your new term collection. You now have a pretty good
overview of what the tool does. I really like it, though it seems that
the licensing terms are a little convoluted. It's free to use for Logrus
Global's existing clients and also free for freelancer translators.
Well, kind of, since Serge would like to negotiate with you about
contributing your glossaries in exchange for the tool's use. Most of us
would agree that glossaries are valuable -- in fact, so valuable that
they often can be sold as an extra service to the client, so I'm not
sure this will be an attractive offer for many. But then, Serge did say
it's a negotiation -- so feel free to start talking with him and his
Another tool developer who reached out to me during the last couple
of weeks was Eugene Kraben. Eugene used to work for the UN in its
statistics department where he developed significant expertise in the
use of MS Access. He used that expertise to develop a tool that is able
to access SDL MultiTerm
databases significantly easier and faster than the often sluggish MultiTerm
itself. It's called Tb-Scout
, and it allows you to search through your MultiTerm
and locate matches in mere seconds. Plus it facilitates the fast export
of termbases into formats like PDF and Excel. If this is relevant to
you, you'll need to keep a couple of limitations in mind: You'll need to
have a 32-bit version of MS Office as well as a MultiTerm Desktop
installed on your computer, and you can search through only one termbase at a time.
Do you remember the translator-specific URL shortener xl8.link?
It's still out there, but we had to password-protect it to exclude nasty
spammer and phishers. If you want to use it to shorten your own URLs in
a way that also shows who you are, you can go here
and unlock it with the login: xl8talk and the password: 20xl8talk18.
And lastly, I would like to point you to a very interesting article
on Kirti Vashee's blog by Nico Herbig on a multi-modal interface for
translation environments. While he is mostly focusing on post-editing, I
think that much if not everything can be applied to virtually any kind
of translation. You can find it here
. (And I think that this is especially interesting for tool developers.)
Oh, and one more thing: I could not attend Lilt's Ascend 2020
conference last week (being a loyal ATA citizen and attending the ATA
conference and all…) but the recording of that conference are available
now. I would highly recommend Kyunghyun Cho's conversation with Lilt co-founder John DeNero
on the "Future of Translation." Little technical at times but super interesting.
The Last Word on the Tool Box Journal
If you would like to promote this electronic journal by placing a
link on your website, I will in turn mention your website in a future
edition of the Tool Box Journal. Just paste the code you find here
into the HTML code of your webpage, and the little icon that is
displayed on that page with a link to my website will be displayed.
If you are subscribed to this journal with more than one email
address, it would be great if you could unsubscribe redundant addresses
through the links Constant Contact offers below.
If you are interested in reprinting one of the articles in this
journal for promotional purposes, please contact me for information
© 2020 International Writers' Group