Tuesday, July 19, 2011

The problem with conventional thinking about machine translation

Reading In the Plex, Steven Levy's fascinating biography of Google, I came across the following quote from machine translation pioneer Warren Weaver:

When I look at an article in Russian, I say, "This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode."


I can tell you with absolute certainty that this is incorrect, and people who don't find themselves able to get past this way of thinking end up being very poor translators.

A more accurate approach would be "This idea really exists in a system of pure concepts unbounded by the limits of language or human imagination, but it has been coded in a way that one subset of puny humans can understand. I will now encode it for another subset of puny humans to understand."

To translate well, you have to grasp the concepts without the influence of the source language, then render them in the target language. You're stripping the code off and applying a new one.

If you translate Russian to English by assuming that the Russian is really in English, what you're going to end up with is an English text that is really Russian. Your Anglophone readers will be able to tell, and might even have trouble understanding the English.

The Russian text is not and has never been English. There's no reason for it to be. The Russian author need never have had a thought in English. He need never have even heard of English. Are your English thoughts really in Russian? Are they in Basque? Xhosa? Aramaic? Of course not! They're in English, and there's no need or reason for them to be in any other language.

This is a tricky concept for people who don't already grasp it to grasp, because when we start learning a new language (and often for years and years of our foray into a new language) everything we say or write in that language is really in English (assuming you're Anglophone - if you're not, then, for simplicity's sake, mentally search and replace "English" with your mother tongue for the purpose of this blog post). We learn on the first day of French class that je m'appelle means "my name is". But je m'appelle isn't the English phrase "my name is" coded into French. (If anything is that, it would be mon nom est.) The literal gloss of je m'appelle is "I call myself", but je m'appelle isn't the English idea "I call myself" coded into French either. If anything, it's the abstract idea of "I am introducing myself and the next thing I say is going to be my name" encoded into French. The French code for that concept is je m'appelle, the English code is "my name is".

I'm trying to work on a better analogy to explain this concept to people who don't already grok it, but here's the best I've got so far:

Think of the childhood game of Telephone, where the first person whispers something to the second person, then the second person whispers what they heard to the third person, and so on and so on until the last person says out loud what they heard and you all have a good laugh over how mangled it got.

What Mr. Weaver is proposing is analogous to trying your very very best to render exactly what you heard the person before you say.

But to grasp concepts without the influence of language and translate well is analogous to listening to what the person before you said and using your knowledge of language patterns and habits to determine what the original person actually said despite the interference.

Which defeats the purpose of Telephone, but is the very essence of good translation.

5 comments:

laura k said...

I am not a translator or anything approaching bilingual, but I can tell you are correct. I would have said essentially the same thing in a much clumsier and less eloquent way.

The few times in my life when I thought in another language, rather than first translated, then spoke - for a short period of time, in Spanish, and later in ASL - it felt totally different. That experience made me realize how inadequate typical language classes are, or at least were in those days.

impudent strumpet said...

I don't know if it's that language classes are inadequate. I think it's more that when we learned English, we had the enormous advantage of no other language system hindering our understanding of concepts. But now we can't forget the fact that we know a language, and I don't think anyone's yet invented a way to communicate concepts without using language as a medium, so we're stuck with interference.

laura k said...

That makes sense, but aren't there better language-learning techniques than constant back-and-forth translation of phrases?

impudent strumpet said...

Some translational thinking is required to build vocabulary, I find. It's certainly the most efficient way to communicate new vocabulary (and makes it possible for dictionaries to exist). Plus, since we already have a mother tongue, when we're flailing around for a way to express a concept in a new language we automatically think "What's the word for hockey puck?" rather than "If only there was some way to refer to this object..."

Some people say you can do immersion for adults without even giving them a grammar primer or a bilingual dictionary, but I can't imagine getting decent results within any decent period of time. It would be like trying to teach someone poker by throwing them into a poker game without telling them the rules or the basic premise first.

Although I suppose to some extent it might depend on what you're trying to do. If you're a tourist trying to buy coffee, maybe you can muddle through without theoretical grammar. If you're a front-line employee trying to placate an upset customer, not so much.

laura k said...

Interesting. Makes sense.