How Text Analytics Would Make Google Translate Better

March 9th, 2011 by Mattias Tyrberg

Google Translate changed the translation industry with free translations and made it possible to read and understand text in many foreign languages. Google’s translation technology is a good start, however I think they can do better. For instance why do Google translate entities?

When translating texts you often see entities being “translated” to something totally different. A company might be “translated” to a new word although it’s an entity and should not be translated at all. One a solution for this problem would be entity extraction.

Let me show you an example:

The Swedish “Hur kan Google översätta Saplo till såpalösning eller bakning?
“How can Google translate Saplo to soap solution or baking?

Is translated to;
“How can Google translate soap solution or baking to soap solution or baking?”

Using entity extraction we find both entities (Google and Saplo) and Google can easily skip translating Saplo to soap solution or baking.

But since Google doesn’t translate all entities, doesn’t that mean Google have entity extraction? I don’t think so. When I first tried Google Translate almost every entity was “translated”. Nowadays the results are much better but still many entity tags are translated when they should not have been. Why? Either because Google use databases to check if a word is an entity (then new entities will not be found) or that they have a rally bad entity extraction technology.

So why haven’t Google start to use a good entity extraction technology? Is it because they don’t think is a big enough problem? No I just think they have not found the technology solving the problem. If Google can’t fix this in-house lets hope they contact us or one of the other text analytics companies (such as OpenCalais or OpenAmplify) that have entity extraction. Why have this problem when the solution exists today?

What problems have you found when using Google Translate? Please add your own examples in a comment.

If you are interested trying an entity extraction technology please try our entity extraction demo to see it for yourselves. You can for example try the text in the example (since the text needs to be at least 250 chars in the demo please add the text five times) and you will see that both the Swedish and the English demo finds both entities or add an text from a news site.