A new technique to optimize data selection for machine translation training
This white paper describes the challenges of selecting data for the training of Machine Translation systems, especially in light of the transition from Statistical MT to Neural MT.
It will provide you an overview of:
- The Problem with Language Data
- History of Language Data
- Web Crawling
- Embedded Sharing
- Institutionalized Data Collection
- TAUS Matching Data as a contemporary solution
If you want a full briefing you can download the full version, or if you have no time to read, you can get the infographic version!