Case Study

Enabling 15% Improvement in the Number of Perfect Translations for ING Hubs Poland

Working as a collaborative partner, our language data for MT training solutions helped facilitate an MT experiment to inform the efficiency of automated translation processes for ING Hubs Poland, a leading multinational banking and financial services corporation. The TAUS datasets improved the number of translations rated perfect by human testers by 15% and it was observed that the output from the engine trained with TAUS datasets will be better than the untrained 95% of the time in Anti Money Laundering (AML) and Human Resources (HR) domains.

Ready to get started?

Let's connect

The Client

ING Hubs Poland

A multinational bank and financial services corporation

Global operations in 47 countries

The Challenge

As one of the world’s top multinational banking corporations, our client strives to deliver communications that accurately reflect the efficiency and effectiveness of their solutions. ING Hubs Poland was looking to boost standard automated ING Hubs Poland translation solutions with dedicated data models (language packs) in cooperation with TAUS to assess the potential increase in efficiency in translation quality and scale up based on the findings from an MT blind test. ING Hubs Poland turned to TAUS, an experienced and trusted partner, to provide the required strictly indomain language datasets to be used in the testing process.

The Solution

After assessing the client’s specific requirements and their specific goals, TAUS provided MT training data in Dutch-English and French-English language pairs in the AML and HR domains within the context of banking.

It was particularly important for ING Hubs Poland to be able to perform the training on their current MT provider, SYSTRAN, a leading machine translation solutions provider. As TAUS language data solutions can be easily applied to any MT training environment, the training process with TAUS datasets and SYSTRAN engines ran very smoothly.

The Results

ING Hubs Poland ran a blind test with 19 human testers who were shown 2202 translations, 44,000 words in 2 language pairs in the AML and HR domains. They were shown two translated versions of the same source text: untrained MT output and MT output after the engine was trained with TAUS datasets. The testers were not aware of the distinction and had to grade the translations based on a star rating system.

*Interval difference between untrained SYSTRAN output and output after training with TAUS datasets in HR domain.

*Interval difference between untrained SYSTRAN output and output after training with TAUS datasets in AML domain.

*The difference between untrained output vs output after training with TAUS datasets for short translations is 1.4 % (0.07/5).

*The difference between untrained output vs output after training with TAUS datasets for medium to long translations is 3.8% (0.19/5).

With 95% precision (Confidence Level), it was seen that all translations generated through the MT engine trained by TAUS datasets will ALWAYS be better.

*The number of translations rated as perfect by the testers increased by 15% with the engine trained with the TAUS datasets.

Project Highlights

Training with TAUS datasets improves the number of perfect translations by 15%

With precision (Confidence Level), it was seen that all translations generated through the MT engine trained by 95% TAUS datasets will ALWAYS be better

The difference between untrained output vs output after training with TAUS datasets for short translations is 1.4 % (0.07/5).

Training with TAUS datasets improves the number of perfect translations by 15%

Let's connect

Talk to our Data Experts to help you find the right type of data for your next project. Niche domains or rare languages? We have a large suite of services to generate your dataset.

Schedule a call

Discover more Case Studies

TAUS Estimate API as the Ultimate Risk Management Solution for a Global Technology Corporation

Based on examples of texts from one of the largest technology companies in the world, TAUS generated a large dataset and customized a quality prediction model. The accuracy rate achieved was 85%.

Domain-Specific Training Data Generation for SYSTRAN

After the training with TAUS datasets in the pandemic domain, the SYSTRAN engines improved on average by 18% across all twelve language pairs compared to the baseline engines.

Customization of Amazon Active Custom Translate with TAUS Data

The customization of Amazon Translate with TAUS Data always improved the BLEU score measured on the test sets by more than 6 BLEU points on average and 2 BLEU points at a minimum.