Finding the Top Performing Data Preparation Tools
24 September 2020, 5 pm CEST
At the end of October 2020, the first platform for language data acquisition - the Data Marketplace - will be live. In order to encourage data sellers and buyers to enter the Data Marketplace, the platform will offer advanced services aimed at facilitating the exploration and making the data on the marketplace MT-ready. These services include data cleaning, clustering, anonymization, and will ensure that the datasets available in the marketplace are of high quality and easily searchable and usable.
These services are built up by leveraging existing tools. Extensive evaluation was performed to decide which tool to adopt for each of the services. In this webinar we present the results of that evaluation and discuss the pros and cons of the tools:
-
Data Cleaning and Anonymization Tools, Luisa Bentivogli and Marco Turchi (FBK)
-
Data Clustering Tools, Amir Kamran (TAUS)