Machine Translation vs. Japanese

Machine translation in action. Toshima-ku’s webpage

Tokyo’s Toshima-ku district, like most metropolitan municipalities, provides an automated Japanese to English machine translation service on its website. Click on the English option, and you are warned ‘content may not be 100% accurate’.

If you then click on ‘Procedures and Notifications’ (so far so good), the eye is immediately drawn to this link: ‘Information of ‘Slime mold monthly’.

This gobbledygook is what the J-SERVER PROFESSIONAL software comes up with for 「ねんきん月間」のお知らせ(‘Information on Pension Month’). Machine translation behemoth Google Translate comes up with the less amusing but still not 100% accurate ‘monthly budget’.

Slime mold

If you understand Japanese, you will understand that the Toshima-ku translation system opted for 粘菌 nenkin (meaning ‘slime mold’) instead of the homonym年金 nenkin (meaning ‘pension’). Little consolation for the non-Japanese-speaking Toshima-ku resident wanting to find out about pensions.

Slime. Jason Antony / freeimages.com

The problem is that the original Japanese phrase on the Toshima-ku website was written in the phonetic hiragana alphabet, not in meaning-providing characters, presumably for stylistic reasons. So with no context or ideographs to guide it, the Toshima-ku translation software had a choice between ‘slime mold’ or ‘pension’, and bizarrely went for ‘slime mold’.

Google Translate (GT) uses a much more sophisticated system that allows you to view, check and edit options, and moreover is based on statistical frequency (presumably ‘pension’ is a more translated word than ‘slime mold’ on local government websites).

Even so, Google admits on its GT research blog that machine translation is unlikely to ever provide a 100% satisfactory substitute for a human translator. One GT research scientist estimated that the tool could only provide ‘an 80% solution’ for effective translation: “You’ll be able to understand the political speech’s point but not it’s rhetoric, nor its beauty.”

Radical Improvement

Nevertheless, the quality of GT’s machine translation for written language has radically improved since its launch in 2006. Until 2016, when GT began using its new key algorithm Google Neural Machine Translation (GNMT) between certain languages, its statistical machine translation system translated words and phrases as separate components in a sentence, based on patterns collated from existing web-based, human-translated content. But the system wasn’t always able to successfully match translated words and phrases to the context of the full sentence.

The new GNMT technology allows for more accurate translation of full sentences. Despite that, Google admits the system ‘can still make significant errors that a human translator would never make’, with the system still unable to accurately consider the context of an entire paragraph or page.

Anthony Burns / freeimages.com

The Google Research Blog even carries a chart comparing sentences translated from Chinese to English via the older algorithm, the newer GNMT and then human translation. The chart clearly illustrates that the human translation remains the best for quality and accuracy.

Character-based languages like Chinese and Japanese offer the biggest challenge to GT’s accuracy. A frequent complaint by users is the lack of romanization of the character/text to accompany the translation (although the system does provide an audio pronunciation guide).

GT is in any case more effective translating between similar languages like English and Spanish. When GT is applied between Russian and Japanese, English is used as an intermediate language. So the Russian is first put into English, and then into Japanese. This roundabout approach puts additional strain on accuracy.

Character-based languages like Chinese and Japanese offer the biggest challenge to GT’s accuracy.

Neither can GT yet cope satisfactorily with the unique typographic symbols used in Japanese. A dot between Japanese words separates components of foreign names or items in a list or ‘ (i.e. 酢の物・揚げ物 means ‘vinegared dishes and fried dishes’) but GT leaves the dot as it is (‘vinegared food · fried food’).

GT does not cope well with neologisms or slang either, being dependent on a written body of existing web translations, often from stuffy sources like documents from the UN or other global organizations.

Wikipedia / yomi955

If you input おにぎりonigiri, GT does indeed produce the correct translation ‘rice ball’ . But if you input the newest incarnation of onigiri, the ‘onigirazu’ or sushi sandwich (a play on the verb ‘nigiru’ to mold) the English translation comes out as the nonsensical Ogura Risa, not even a correct romanization.

GT also struggles with Japanese’s complex three-alphabet system, particularly with the phonetic katakana system. It translates サービス with literal accuracy as ‘service’, but this does not take into account how the English word ‘service’ has been reworked in Japanese over time, and now means something given ‘on the house’ or at a discounted rate.

A “simple” thank you.

Also lost in translation are the multiple nuances of even key/ frequently-used Japanese phrases. For example, よろしくお願いします(yoroshiku onegai shimasu) translates simply as ‘thank-you’ with GT, whereas in practice, it is more accurately a varied expression of expectation that one will receive support and cooperation from someone in a certain situation.

Tellingly it is statisticians, not linguists, who work on GT. Google’s definition of the ‘best’ translation is ‘the most statistically likely’.

To be fair, Google is upfront about the current limitations of the system. GT advertisements feature usage scenarios that don’t require much translation finesse, such as a tourist using her GT app to order Peking Duck in a restaurant in China.

Nobody is suggesting people use GT to translate highly sensitive documents or literature. The message, even from Google itself, is clear: 80% accuracy by machine, 100% accuracy by human.

Tags: ,

No comments yet.

Leave a Reply