Saudi Arabia is leading the development of Arabic language models, according to a new study released in the first quarter of 2025. The research, conducted by the Saudi Data and Artificial Intelligence Authority (SDAIA) in collaboration with the King Salman Global Academy for Arabic Language (KSGAAL), reveals significant progress in bolstering the presence of Arabic in the digital sphere. This initiative aims to enhance the language’s global competitiveness and facilitate innovation.
The study, focused on advancements through 2025, identifies Saudi Arabia as the frontrunner among nations creating these specialized AI tools. The findings emphasize the crucial role these models play in strengthening Arabic’s digital footprint and accelerating the integration of artificial intelligence across various sectors. The work was undertaken to better understand the needs within the Arabic-language AI ecosystem.
The Rise of Arabic Language Models
Development of Arabic language AI has followed a clear trajectory. Before 2000, these systems relied heavily on pre-programmed rules. Subsequently, statistical and neural network models emerged, and more recently, the field has been transformed by large language models (LLMs) and generative applications starting in 2022. This period saw a proliferation of Arabic models tailored for conversation, content creation, and specialized technological, educational, and research uses.
Current Landscape and Dominance of Text
As of the first quarter of 2025, the study documented over 53 distinct Arabic language models under development. While Saudi Arabia leads in overall development, the report indicates a growing global interest in technologies that support the Arabic language. However, investment remains disproportionately focused on text-based models. Currently, 81 percent of Arabic language models handle text only, with multimodal models – those processing audio and visual data – accounting for just 7 percent.
This imbalance highlights a potential area for future growth, as audio and visual content are increasingly important in the digital world. The need to create models that handle the nuances of spoken Arabic and visual representations of the language is becoming more urgent.
Performance Benchmarks and Areas for Improvement
The performance of these Arabic language models was assessed using the BALSAM benchmark, created by KSGAAL. BALSAM evaluates Arabic language AI against global standards across a range of linguistic tasks. The findings show that globally developed models generally outperform Arabic models in most areas, particularly in complex linguistic skill categories.
However, the research also identified specific strengths within Arabic models. These models demonstrated a slight advantage in text summarization and achieved comparable results to international models in creative writing and reading comprehension. This suggests focused development can yield competitive results in niche areas. The focus on natural language processing is key to these developments.
Roadmap for Leadership and Future Development
To solidify its leadership position in large Arabic language models, the study outlines a specific roadmap. A primary component is the creation of high-quality Arabic datasets encompassing diverse dialects and subject areas. The availability of comprehensive data is a critical constraint for developing effective AI.
Additionally, the roadmap emphasizes the need for developing models of varying capabilities and sizes, catering to different application requirements. Establishing standardized Arabic benchmarks, like BALSAM, is crucial for objectively evaluating model performance and driving continuous improvement.
Perhaps most importantly, the study stresses the importance of fostering adoption of these models within both public and private institutions as well as broader community access. Widespread utilization will accelerate feedback loops and identify further areas for development. This will also help grow the artificial intelligence ecosystem in the region.
The collaboration between SDAIA and KSGAAL reflects Saudi Arabia’s dedication to integrating its linguistic and cultural heritage with advancements in technology. By ensuring a strong Arabic presence in the global AI landscape, the Kingdom aims to position itself as a pivotal regional center for Arabic language technology innovation and empower the creation of digital Arabic content.
Looking ahead, further investment in multimodal models and publicly available, high-quality Arabic datasets is expected. The next phase will likely involve efforts to bridge the performance gap between Arabic and global language models, particularly in core linguistic tasks. The long-term success will depend on consistent funding, ongoing research, and a collaborative approach between academic institutions, government agencies, and private sector companies.

