TAUS Matching Data Whitepaper

A new technique to optimize data selection for machine translation training

Matching-Data-White-Paper-January-2019-1 This white paper describes the challenges of selecting data for the training of Machine Translation systems, especially in light of the transition from Statistical MT to Neural MT.

It will provide you an overview of:

The Problem with Language Data
History of Language Data
Web Crawling
Embedded Sharing
Institutionalized Data Collection
TAUS Matching Data as a contemporary solution

If you want a full briefing you can download the full version, or if you have no time to read, you can get the infographic version!

Matching Data White Paper

A new technique to optimize data selection for machine translation training