Building a Field-Specific Neural Machine Translation Engine - Focusing on Data Collection and Performance Verification (Part 2)

Performance Verification of a Field-Specific NMT Engine

We actually built a prototype of a field-specific Japanese-Chinese/Chinese-Japanese NMT engine using the Chinese-Japanese bilingual data in the field of IT that we collected in "Building a Field-Specific Neural Machine Translation Engine - Focusing on Data Collection and Performance Verification (Part 1)".
To create the model of the translation engine, we used a tool called OpenNMT.

Now let's actually translate a test sentence (in the IT field) and see how the IT field NMT performs.
Here, the generic NMT for comparison was set to the industry-leading Google Translate.
The numbers in the table are BLEU values (4-gram), which is an automatic evaluation criterion. ( Explanation of BLEU )

China-Japan Japan-China
IT field NMT 40.79 35.35
Google 37.53 28.43

 

The resulting BLEU values show that, in the case of the IT field test, the IT field NMT rated better than the generic Google Translate, even with a seemingly small amount of data.

How did the IT NMT actually outperform the general-purpose Google Translate?
Here are some examples

Source text Reference translation IT field NMT translation Google translation
the right and weight of ownership Transition Weight Transition Weight transition weight

 

For the Chinese word "the right and weight of ownership= weight," the Japanese translation of the IT NMT correctly translates to "weight," but Google mistakenly outputs the generic meaning of "the right and weight of ownership = weight.

Thus, the problem of translating technical terms that generic NMTs have can be improved by using IT field NMTs.

Field-Specific NMT May Outperform General-Purpose NMT Even with Insufficient Bilingual Data

In this study, we examined methods for collecting bilingual data in order to create a field-specific NMT engine. Other data collection methods introduced in the text will be presented at another opportunity.
In addition, performance verification showed that even if the amount of bilingual data used was not sufficient, field-specific NMTs could outperform state-of-the-art general-purpose NMTs.
After securing a field-specific NMT, we can finally propose a translation service that meets the expectations of our clients. The translation process works as follows.
1. automatic field determination of input text
2. passing to each field-specific NMT engine
3. output of translated text from the appropriate field engine
How about that?
I hope this has given you some idea of what field-specific NMT is all about.
We will be introducing more NMT-related content in the future, so please look forward to it!