Meta’s No Language Left Behind (NLLB) project aims to make it easier to see Facebook and Instagram posts in 200 lesser-spoken languages around the world. The project uses artificial intelligence trained with data from the Opus repository and mined data from sources like Wikipedia. This multilingual language model (MLM) allows for translations between any pair of languages without relying on English data.
The NLLB team has improved the accuracy of translations by 44 per cent from their first model, released in 2020. They evaluate the quality of translations using a benchmark of human-translated sentences and a list of “toxicity” words or phrases to filter out harmful content. When fully implemented, Meta estimates there will be over 25 billion translations every day on Facebook News Feed, Instagram, and other platforms.
However, experts like William Lamb, professor of Gaelic ethnology and linguistics at the University of Edinburgh, suggest that Meta should consult with native speakers to improve the translations. Lamb, an expert in Scottish Gaelic, notes that the translations in Scottish Gaelic are currently not very good due to the crowdsourced data used by Meta. He recommends talking to native Gaelic speakers to enhance the accuracy of translations.
Alberto Bugarín-Diz, professor of AI at the University of Santiago de Compostela in Spain, believes that linguists and Big Tech companies should work together to refine the data sets used for translations. He suggests that specialists revise and update the texts with metadata to improve the quality of translations. Bugarin-Diz emphasizes the importance of using quality data online and complying with legal requirements to ensure accuracy in translations.
While Lamb expresses doubts about the reliability of Meta’s translation abilities, Bugarín-Diz takes a more optimistic view. He believes that using the Meta translations is essential to encourage the company to invest time and resources into improving them. Despite the potential for errors, Bugarin-Diz emphasizes the importance of understanding the weaknesses of AI technology before relying on it for translations.