Regístrese ahora para una mejor cotización personalizada!

Noticias calientes

AI for the world, or just the West? How researchers are tackling Big Tech's global gaps

Mar, 26, 2025 Hi-network.com
digital globe concept
rob dobi/Getty Images

Since the launch of OpenAI's ChatGPT in 2022, artificial intelligence (AI) has become significantly entrenched in our lives. But popular AI products are set up to serve primarily American and European interests, despite being touted as global tools democratizing access to technology, from the use cases they're applied to the languages they speak. 

Several African researchers outside tech's US nucleus are trying to challenge that status quo and, with it, the bigger power dynamics at play in the AI industry. 

A global AI power imbalance 

The Distributed AI Research Institute (DAIR) is an international group of researchers and technologists focused on what it calls "independent and community-rooted AI research free from Big Tech's pervasive influence." I spoke to DAIR members creating Africa-centric AI solutions that serve particular societal needs. Ultimately, they demonstrate use cases for AI that prioritize the historically dispossessed instead of multinational corporations or solely Western users.

Also: AI agents aren't just assistants: How they're changing the future of work today

Nyalleng Moorosi is a senior researcher at DAIR based in Lesotho and a founding member of Deep Learning Indaba, an organization that aims to strengthen AI and machine learning in Africa. Her background in machine learning and teaching in South African public schools informed her philosophies around equity in the tech space. 

As an educator at the University of Forte -- one of the country's few universities that accepted black South Africans during apartheid -- Moorosi witnessed many students struggle with poverty while in school. "It was mind-boggling to imagine doing the things that I did through[out] undergrad and post-grad [burdened by] so much insecurity," she noted. 

After teaching, Moorosi was recruited by Google, where she was one of the first employees at the Google Africa AI research lab in Ghana. As a software engineer, Moorosi developed methodologies and technologies to help ensure AI systems are built responsibly.

"I joined Google because they [were] building an office in Africa, and I wanted to [be in] Africa," Moorosi said. "I didn't want to just go to Google. I wanted to go to Google Africa." 

Also: OpenAI tailored ChatGPT Gov for government use - here's what that means

But after a friend and colleague, Timnit Gebru -- DAIR's founder and a former co-lead on Google's ethical AI team -- contacted her inquiring about the lack of African representation within Google Africa, Moorosi began to question whether Google was the fit for the type of equity work she wanted to do in machine learning. 

Big tech companies have appeared to censor those seeking to uncover tech-induced societal harms and challenge mainstream AI practices. That's why Moorosi and Gebru wanted to centralize power within the communities that the tech industry has historically excluded by keeping -- and funding -- local experts on the ground. 

DAIR's AI study 

In 2018, Moorosi, Gebru, and DAIR fellow Raesetje Sefala began collecting satellite imagery to track changes in the built environment of South African townships -- working-class neighborhoods historically populated by Black residents. Interested in how South Africa's historically Black urban neighborhoods had changed since apartheid ended, DAIR began compiling a dataset to determine whether occupants' lives had improved over time. 

Also: I was an AI skeptic until these 5 tools changed my mind

South African townships are underdeveloped urban neighborhoods located on the outskirts of cities. Township inhabitants tend to have a poorer quality of life than those in wealthier suburbs. However, because the government-issued census was used to allocate public spending to groups with more affluent areas, township data became invisible. This approach results in spatial apartheid, which disproportionately excludes Black people living in townships from accessing crucial public resources, such as adequate health services, education, and green spaces. 

This data problem impacted DAIR's study because the researchers relied on pre-existing data sets -- mainly from South African AI models that struggled to capture the intricacies of the country's urban landscapes and differentiated townships from suburbs. So instead, researchers used the millions of satellite images of South African provinces and the geospatial data they collected to train machine-learning models and build an AI system that labeled specific areas as wealthy, non-wealthy, and nonresidential building clusters, such as vacant land or industrial areas.   

screenshot-2025-02-04-at-1-57-44pm.png
Raesetje Sefala, Timnit Gebru, Luzango Mfupe, Nyalleng Moorosi

However, when DAIR tried to publish these findings, they received commentary from predominantly white Western academic institutions that the study was a geographic one, not machine-learning research. According to Moorsoi, they were essentially told the study wasn't AI. 

Also: Want to learn American Sign Language? AI will teach you now - here's how

As Moorosi explained, despite using computer vision methods, academic institutions did not accept their spatial apartheid project as part of the field of machine learning: "We use the same metrics, algorithms, and communication methods, [including] plots and everything. It's so crazy because many toy datasets were being used then, [but] we had this dataset about actual things, and it was too niche." 

But not niche for Africans, she added: "This tracking of how historical segregation affects how we live is present in many ex-British colonies. It's in Nairobi. It's in Lagos," she explained. "In the colonies, it was standard that the white people lived there and the black people lived there. And the distribution of resources was different between there and there.

"So, it feels niche because these people are not Africans, and they do not experience how colonization in Africa shaped [the] world [in which] we live," she said. Moorosi pointed to how the content -- not the quality -- of DAIR's AI study seemed to undermine its visibility in a Western-dominated industry. 

Providing for underserved communities 

Asmelash Teka Hadgu, co-founder and CTO of Lesan AI and research fellow at DAIR, further emphasized this point. He described the intent behind Lesan, a language translation and transcription tool primarily for Indigenous African languages. 

Also: 3 ways Amazon just leapfrogged Apple, Google, and ChatGPT in the AI race

Hagdu said his approach to AI differs from US-based tech giants because Lesan AI focuses on low-resource languages like Amharic, Tigrinya, and other dialects. Because Hagdu speaks both Amharic and Tigrinya, he built a robust data set by focusing on the most descriptive parts of his language, using "repurposed" newspaper and radio content available in Ethiopian local communities, as he explained in our interview. 

In the African context, popular language models from tech giants like OpenAI and Anthropic do not adequately represent hundreds of millions of people. For example, the performance of OpenAI's ChatGPT on a data set of 670 languages shows that African languages are the least supported, according to Wei Rui Chen's paper, Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability. 

"OpenAI's ChatGPT is utterly broken, not slightly wrong, but creating gibberish in languages such as Amharic and Tigrini," said Hagdu. "Yet, they're still doubling down on that old way of thinking that centers on finding solutions for English first. And [assuming] other languages will catch up."

By building high-quality data sets for low-resource languages, Lesan aims "to serve millions of accurate translations for thousands of people and open up the web's content [to] these communities" because of the limited online content currently available in these languages, Hagdu explained. 

Also: The head of US AI safety has stepped down. What now?

"They're not add-ons," he said. "We don't spend 95% of our resources on a handful of languages and then work on what they term as long-tail languages." Here, long-tail languages refer to languages that are lesser-known, niche, or localized less frequently, regardless of how many people speak those languages. 

When Western AI companies attempt to represent low-resource languages within their AI systems, their processes are ill-equipped to tackle the challenge of adequate translation. This issue is largely because low-resource languages aren't digitally available for data scraping in the same ways Western languages like English are, especially considering the fact that the internet is still overwhelmingly based in English. 

Moreover, the data often used to train AI models is heavily skewed to the Western world. In a study conducted by the Data Provenance Initiative, over 50 researchers investigated where the data that builds AI models comes from. The researchers analyzed over 4,000 public data sets spanning over 600 languages, 67 countries, and three decades. About 90% of the data in models came from Europe and North America, with only 4% coming from Africa.

Also: How we test AI at in 2025

Hagdu said that Facebook's No Language Left Behind Project "worked on hundreds of languages, [yet] the African languages included are based on what I call 'convenience.' [They] scrape the web for whatever resources they can find for these languages and then use automated methods to filter, align, and create the systems." 

Companies offer basically zero resources for African languages, he said: "You would be surprised (or not) to find that people would rather fund millions of dollars on the next startup for an English LLM. Whereas, low-resource languages, such as Amharic and Tigrinya, languages spoken by millions of people," are rarely considered for large-scale AI funding. 

Bloomberg reported in November that the French telecommunications firm Orange SA had partnered with OpenAI and Meta Platforms Inc. to begin training AI programs on African languages, such as Woolof, Pulaar, and Bambara, to "address a shortage of models for the continent's thousands of dialects." 

However, many West and Sub-Saharan African languages depend on distinct tonal systems to enunciate the meaning of words and oral traditions dating back to the precolonial era. Many African oral languages are slowly disappearing because the population of native speakers is declining, while colonial languages like French and English are becoming increasingly widely spoken. This shift makes it difficult for LLMs developed by Western tech companies to fully represent African languages because they don't understand their cultural specificities.  

For Hagdu, elders and community members were critical to his machine-learning systems, ensuring he correctly represented the local context of the communities. 

Also: How to run DeepSeek AI locally to protect your privacy - 2 easy ways

Meanwhile, even when Big Tech companies enlist smaller AI technologists and startups to develop data sets to train language-specific models, companies take advantage of open-sourced work to capture ideas, data, and resources from smaller teams. Georg Zoeller of the Centre for AI Leadership in Singapore recently explained: "By open-sourcing the basic tools for AI, hyperscalers have enabled startups to build products in the field and used it to replace internal teams as the primary source of product R&D."

Dr. Paul Azunre, co-founder of Ghana NLP (natural language processing), told me how easily big companies poach from startups in the Global South without compensating them for their work.  

"Once Facebook came to us after they put out a model, which was open source and was built on our data. Then, they were doing an open call for proposal[s]. They came to us and said, 'Why don't you put in [a] proposal for funding?' And we said, 'Well, you're already using our work,'" Azunre explained. "'So what else do we need to prove to you? Just pay us.'" 

Ghana NLP was founded in response to Ghanaian languages being excluded from software products like Google Translate and speech recognition tools. Seeking to fill that gap, the startup focuses on voice-speech recognition, text-to-speech, and speech-to-text translation in the local languages of Twi, Ewe, Yoruba, Fante, and Ga, and is expanding to include languages from neighboring countries, including Nigeria, Burkina Faso, Kenya, and Tanzania. 

Also: As AI agents multiply, IT becomes the new HR department

"As a developer who tries to make self-sustaining products, I am sympathetic to why certain products or projects are prioritized in a certain way," Azunre said. "We are going to put out Twi first because in Ghana we have 30 million Twi speakers... but the difference between what we are doing and [tech giants] is for us, the guiding principle is the locals are top of mind."

He continued: "There is no other option. There is no build the thing and then take it to Silicon Valley, and then it sits there, generating jobs there, but it's translating our culture and [extracting our data]." Moreover, "the jobs have to be in the communities where you are extracting the knowledge from."

While Azunre is a proponent of open source, he warned against the capture of datasets by big tech to build solutions without allowing local communities to retain control over their data, also known as community data sovereignty. Moreover, he argued that creating local data sources and training Ghanaians creates a robust AI ecosystem that empowers communities facing digital inequality and ensures Africa's linguistic and cultural specificities are not missing in AI solutions. 

What's next for AI in Africa 

As tech governance researcher Chinasa T. Okolo explained, many African governments are considering establishing frameworks for AI governance that combat multinational corporations' influence over the AI landscape on the African continent. Seven African countries (Benin, Egypt, Ghana, Mauritius, Rwanda, Senegal, and Tunisia) have drafted national AI strategies, but none have implemented a formal AI regulation strategy.

The South African government recently released a National AI Policy Framework to ensure equitable access to AI technologies, especially in underserved and rural communities. In addition, 36 African countries have established formal data protection regulations -- opening up space for more regulatory AI frameworks, according to Okolo. 

Also: Police are using AI to write crime reports. What could go wrong?

As of late, Western-based AI companies have been pursuing similar regional-specific LLMs for Arabic-speaking countries across the MENA region, such as Mistral's new AI model that specializes in Arabic and is tailored to grasp the cultural nuances sometimes overlooked in larger, more general-purpose models. Meta also revealed it's expanding Meta AI across the MENA region to provide language support for Arabic-speaking users on its apps.

But an increasing number of AI technologists and researchers are amplifying the parallels between the legacies of colonial extraction and the trends of AI development globally, as well as the hype behind generative AI systems today. As MIT Tech Review's Karen Hao explained: "While it would diminish the depth of past traumas to say the AI industry is repeating [the exact modalities of colonial] violence today, it is now using other, more insidious means to enrich the wealthy and powerful at the great expense of the poor."

Want more stories about AI? Sign up for Innovation, our weekly newsletter.

tag-icon Etiquetas calientes: innovación

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.
Our company's operations and information are independent of the manufacturers' positions, nor a part of any listed trademarks company.