Abu Dhabi is racing to make homegrown generative AI tools in Arabic

G42’s Inception will start meeting with ADNOC, Etihad and First Abu Dhabi Bank this week to build applications for Jais, its new large language model

ABU DHABI, United Arab Emirates – Some of the UAE’s biggest firms will meet this week with artificial intelligence group G42 to explore ways to deploy its new large language model (LLM) that makes the latest advances in computer technology more accessible in Arabic.

Abu Dhabi National Oil Company, Etihad Airways, First Abu Dhabi Bank and telco e& have all partnered with the government-backed G42 subsidiary Inception to build commercial applications for Jais, a bilingual LLM it launched last week after a summer of fine-tuning.

Built on English and Arabic-language data and in collaboration with Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and California AI company Cerebras, the makers of Jais say it is the best-performing Arabic-language LLM available, with potential to be a foundational tool to serve the more than 400 million Arabic speakers worldwide. 

“The next step now quite literally is to sit down with a number of these different organizations” to determine what’s next for Jais, Timothy Baldwin, acting provost and department chair of natural language processing at MBZUAI, told The Circuit. Baldwin, who led the academic research team of about 20 post-doc students and professors, said that at the outset, the LLM may be used to build customer service, contract and workflow optimization tools. 

LLMs are neural networks trained on vast databases of text, typically from the corpus of the web, including Wikipedia pages and articles, as is the case with market leaders like OpenAI’s ChatGPT and Google’s Bard.These foundational models are able to identify patterns and predict what text should come next, allowing them to be prompted by humans and, in turn, converse or compose with people working with these models to churn out all manner of text such as contracts, research papers or computer code. Named after the highest mountain in the UAE, Jais was trained in Abu Dhabi on a subset of Cerebras’ Condor Galaxy 1 AI supercomputer by a team from MBZUAI and Inception. The project follows a $100 million deal for Cerebras to provide up to nine supercomputers to G42, signed in June, an agreement widely seen as a challenge to chips market leader Nvidia. 

Arabic is one of the most spoken languages worldwide but its online presence is relatively small, with about 1% of Arabic content available online, according to data presented by the companies.

Despite challenges, Jais outperforms existing Arabic models and is also competitive with English models of similar size despite being trained on significantly less English data, according to the group. The researchers said in a white paper that this shows that the model’s English component learned from the Arabic data and vice versa, potentially beginning a new, multilingual focus for LLMs. Jais has also been designed to have a nuanced understanding of the region as opposed to more U.S.-centric models that dominate the market, Baldwin said.

The UAE has set AI high on the national agenda as it seeks to reduce its dependence on the U.S., China and Israel for technology resources and build up local capabilities. In 2017, the country laid out a plan for how the then-nascent technology would be a contributor to the economy, and named a state minister to oversee AI development and policy. A year later, G42 was formed in Abu Dhabi, chaired by the UAE’s national security advisor, Tahnoun bin Zayed Al Nahyan. In 2020, MBZUAI, the world’s first dedicated AI research university, also opened its doors in the capital. 

The UAE uses another open-source LLM, known as Falcon, at the state-owned Technology Innovation Institute in Masdar City, Abu Dhabi.

Subscribe now to
The Daily Circuit

The Daily Circuit newsletter (coming soon!) connects doers and dealmakers in the Middle East and those who care about the region from afar. Sign up today to be among the first in the know.