Policy Brief

The Use of Synthetic Data to Train AI Models: Opportunities and Risks for Sustainable Development

This technology brief explores the potential of synthetic data to accelerate the attainment of the SDGs through AI in the Global South.

Publication Date
4 Sep 2023
Authors
Tshilidzi Marwala Eleonore Fournier-Tombs Serge Stinckwich

Using synthetic or artificially generated data in training AI algorithms is a burgeoning practice with significant potential. It can address data scarcity, privacy, and bias issues and raise concerns about data quality, security, and ethical implications. This issue is heightened in the Global South, where data scarcity is much more severe than in the Global North. Synthetic data, therefore, addresses the problem of missing data, leading, in the best case, to better representation of populations in datasets and more equitable outcomes. However, we cannot consider synthetic data to be better or even equivalent to actual data from the physical world. In fact, there are many risks to using synthetic data, including cybersecurity risks, bias propagation, and simply an increase in model error. This technology brief proposes recommendations for the responsible use of synthetic data in AI training and the associated guidelines to regulate the use of synthetic data.

Yellow Pattern BG

Download the technology brief

The Use of Synthetic Data to Train AI Models: Opportunities and Risks for Sustainable Development (available in Chinese, English and Japanese)
Download

Related content

Event

Building Government AI Capacity: Insights from the City, National and Regional Perspectives

This is a Side Event during the 11th Multi-Stakeholder Forum on Science, Technology and Innovation for the SDG taking place in New York on 5 May 2026.

-

News

The Biggest Data

Genomics for One Health integrates state-of-the-art Omics and the One Health approach capabilities to resolve the greatest questions in sustainability

24 Apr 2026