Facebook and Instagram users will soon have the ability to view posts in over 200 lesser-spoken global languages thanks to Meta’s No Language Left Behind (NLLB) project. This project, which includes languages such as Scottish Gaelic, Galician, and Icelandic, uses artificial intelligence (AI) trained with data from the Opus repository to create multilingual language models. Despite the advancements, experts suggest that consultation with native speakers and language specialists will be crucial to improving the accuracy of translations.
Meta’s NLLB project utilizes data from the Opus repository, along with mined data from sources like Wikipedia, to train their AI for multilingual translations. The team behind the project evaluates the quality of their translations using human-translated benchmark sentences and open-source datasets. According to Meta, the project has already improved translation accuracy by 44%, with an estimated 25 billion translations expected daily on platforms like Facebook and Instagram once fully implemented.
However, experts like William Lamb, a professor of Gaelic ethnology, emphasize the importance of consulting with native speakers to improve translation quality. Lamb points out that the accuracy of translations in low-resource languages like Scottish Gaelic is not yet satisfactory due to the crowdsourced data used by Meta. He suggests that Meta should work with organizations like the BBC, which create high-quality content in endangered languages, to enhance the translation accuracy and preserve these languages.
Alberto Bugarín-Diz, a professor of AI, recommends collaboration between linguists and Big Tech companies to refine the datasets used for translation. Bugarín-Diz highlights the necessity of specialists revising the texts, correcting errors, and updating metadata to improve the quality of translations for lesser-spoken languages. He also suggests that Meta and other AI companies carefully select quality data online while following legal requirements to ensure accuracy without infringing on intellectual property laws.
While Lamb expresses reservations about the current state of Meta’s translation abilities, Bugarín-Diz believes that using these tools is necessary to incentivize companies to invest in improving them. Despite the imperfections, he stresses the importance of understanding the limitations of AI technology and utilizing it effectively. As AI continues to evolve, collaboration between experts and technology companies will be essential to enhance the accuracy and accessibility of translations in lesser-spoken languages on social media platforms.