Interview with a representative about "kode-AI Translation Cloud API" Machine Learning and Evaluation! (Part 1)

 Building a Field-Specific Neural Machine Translation Engine from the Ground Up

Hello everyone! It's been a while (;^ω^) here at the Kohden Development Office blog.

In this blog, Satake, our sales representative, interviewed our developers in person,

Satake, who is in charge of sales, interviewed our developers to introduce various development and product information in the form of a dialogue.

 

The person interviewed today is this person!

Mr. Shibata is currently studying Chinese, Korean, and Russian.

(She also responded to our last interview with My Interpreter Assistant.)

 

Satake: It is amazing that you are learning three languages!

Shibata: Thank you very much, learning languages is fun!

Satake: This time, please tell us a lot about the
machine learning of "kode-AI Translation Cloud API," which is expected to be used in the future for multilingual broadcasting in train stations and commercial facilities, etc. ♪

Shibata: Yes, please do.

 

1. What is Neural Engine Machine Learning?

 

Satake: First of all, please give our readers a brief explanation of what this machine learning and evaluation is all about.

Shibata: Yes, this is to verify the learning effect of the neural engine by machine learning 10,000 cases of data (bilingual corpus) of Japanese and English sentences in pairs and comparing them with the data before
learning, and to measure the machine learning results of the neural engine.

 

(What is a bilingual corpus? (If you are wondering what a bilingual corpus is, check this blog!)

We asked the developer about machine processing in the creation of the Japanese-Korean dictionary.

Satake: What is machine learning?

Shibata: Machine learning is a process of learning and remembering things like "For this kind of Japanese, translate it into English like this.

A simple example would be "May I have your name?
When we learn a foreign language, we learn the corresponding foreign language to the Japanese as above. In other words, it is training.

Satake: I see! That's easy to understand. By the way, the "kode-AI Translation Cloud API" that you mentioned earlier is

is different from cloud translation that uses a regular translation engine?

Shibata: "kode-AI Translation" is a neural machine translation using AI (artificial intelligence).

It has a reputation for high translation accuracy, especially for English to Japanese.

Click here for more details about "kode-AI Translation

Satake: Wow! As the world's highest standard of AI translation, we can expect high translation accuracy!

Next, what specific steps did you take to evaluate the results of the training?

 

2. Introducing the procedure for evaluating machine learning results

 

Shibata: This time, we used the following procedure to compare machine learning data with non-trained data.

 

(1) 10,000 Japanese source texts were manually translated into English to create a bilingual Japanese-English corpus of 10,000 pairs
(2) 9,000 pairs from (1) + 1,000 pairs from Kodensha's bilingual Japanese-English corpus for specific categories
= 10,000 pairs in total were used as machine learning data
(3) The remaining 1,000 pairs from (1) were excluded from the evaluation data and used as machine learning data. (3) The remaining 1,000 pairs in (1) are excluded as evaluation data and not included in the machine learning data
(4) Perform Japanese-English machine translation on the evaluation data in (3) before machine learning ( Before)
(5) Perform machine learning on the 10,000 pairs of data in (2)
(6) Perform Japanese-English machine translation on the evaluation data in (3) after machine learning (After) ( After)
(7) Evaluate the translation quality of the results of (4) and (6) mechanically (*)
*Evaluate by similarity (=BLEU value) with the English translation of (1)
(8) Manually evaluate ( Before vs. After) 100 of the translation results of (4) and (6) by a translator

 

Satake: That's a lot of steps! And I don't really understand (1) to (6) because they are a bit difficult to understand!

I don't really understand what's involved in steps (1) through (6).

 

Now, how much did the machine learning improve the translation accuracy?

I'll be posting the results in my next blog entry!