News

Exploring Opportunities and Risks of Using Synthetic Data in the Training of AI Models

In a new policy brief, UNU experts analyse the potential of synthetic data to attain sustainable development, especially in the Global South.

Exploring the potential of synthetic data to accelerate the achievement of the Sustainable Development Goals (SDGs) through Artificial Intelligence (AI) in the Global South while mitigating their important risks is the objective of a new policy brief authored by Prof. Tshilidzi Marwala, Rector of UNU, Dr. Eleonore Fournier-Tombs, Head of Anticipatory Action and Innovation at UNU Centre for Policy Research (UNU-CPR), and Dr. Serge Stinckwich, Head of Research at UNU Macau. 

As the three experts point out, synthetic data – information created by computer simulations or algorithms that reproduce some structural and statistical properties of real-world data – “offer numerous opportunities, such as rebalancing biased datasets, protecting data privacy, and reducing the cost of data collection”. But the use of synthetic data, they note, also poses “many risks”, such as issues related to “data quality, cybersecurity, misuse, bias propagation, IP infringement, data pollution and data contamination”. 

Addressing such potential and concerns in the policy brief titled “The Use of Synthetic Data to Train AI Models: Opportunities and Risks for Sustainable Development”, Prof. Tshilidzi Marwala, Dr. Eleonore Fournier-Tombs and Dr. Serge Stinckwich propose “an early step in standardising the use of synthetic data”, by outlining technical standards to be adopted by software developers, as well as defining recommendations for policymakers. 

You can download the full policy brief here.