Digital platforms have delivered numerous benefits, including supporting communities during crises, amplifying marginalized voices, and facilitating global movements. This can enable the United Nations (UN) to engage with people worldwide in the pursuit of peace, dignity, and human rights to achieve the Sustainable Development Goals (SDGs). However, this has also been increasingly exploited to undermine scientific discourse, propagate disinformation, and incite hatred, impacting billions of individuals. As articulated by UN Secretary-General António Guterres, the unchecked spread of hatred and misinformation within the digital realm is presently causing significant global harm. Digital platforms are being manipulated to distort scientific understanding and propagate falsehoods and hatred to vast audiences. Thus, this pressing global threat necessitates coordinated international action to ensure a safer and more inclusive digital environment while steadfastly protecting human rights.
UN is developing a Code of Conduct for information integrity on digital platforms, seeking to provide a concerted global response to information threats that is firmly rooted in human rights, including the rights to freedom of expression and opinion and access to information. At UNU Macau, our interdisciplinary approach also allows us to focus on the intricate relationship between AI and information integrity through a multifaceted lens, considering both technological advancements and the ethical imperatives outlined by UN values and the SDGs.
Information Integrity and AI
Information integrity encompasses the accuracy, consistency, and reliability of information. It faces significant threats from disinformation, misinformation, and hate speech. Although universally accepted definitions of these terms are lacking, UN entities have established working definitions. The Special Rapporteur on the promotion and protection of the right to freedom of opinion and expression defines disinformation as “false information that is disseminated intentionally to cause serious social harm”, and misinformation as “dissemination of false information unknowingly”. A UN General Assembly resolution on countering disinformation emphasizes that “all forms of disinformation can negatively impact the enjoyment of human rights and fundamental freedoms, as well as the attainment of the SDGs” and encourages conducting human rights due diligence of “the role of algorithms and ranking systems in amplifying disinformation.”
As the proliferation of AI-generated content (AIGC) poses significant risks of misinformation and disinformation, maintaining information integrity in the age of AI is paramount for global society, upholding human rights, ensuring sustainable development, and preventing the spread of misinformation and disinformation in the digital age. The General Assembly's resolution on AI emphasizes the need for ethical guidelines and robust governance frameworks to ensure AI technologies uphold human rights and contribute to sustainable development, while recognizing the risks of “improper or malicious design, development, deployment and use” of AI system that could “undermine information integrity and access to information”. UNESCO's Recommendation on the Ethics of AI highlights transparency, accountability, and fairness in AI systems, calling for measures to prevent AI's misuse in spreading disinformation and hate speech. The Secretary-General's High-Level Advisory Body on AI, in its report "Governing AI for Humanity", also advocates for international cooperation and regulatory frameworks to harness AI for the common good while mitigating the risks of AI-driven disinformation.
Improving Interpretability and Explainability of Generative AI Models
The emergence of generative AI (GenAI) represents the latest evolution in AI, bringing both groundbreaking capabilities and new risks to the forefront. GenAI models, including those used for text generation, image synthesis, video creation, and multi-modal tasks, make predictions like 'next-word' or 'next-frame,' and convert information from one modality to another. These models, composed of numerous artificial neurons, are trained on extensive datasets, capturing intricate statistical patterns and structures within the data. During inference, they use contextual information and prompts to generate outputs based on the statistical distributions learned during training. Despite their sophistication, this process involves a degree of randomness, as models sample from the highest probability distributions, which can impact the integrity of the content produced. Thus, understanding how these models encode training samples and generate responses is crucial for enhancing the integrity and reliability of their outputs.
If we compare the connections between neurons to structures in the human brain and the prediction generation process to human thought, research into the interpretability and explainability of models can be categorized into neuroscience-based and psychology-based approaches (Matthew Hutson, 2024). While interpretability generally refers to understanding how a model works internally—grasping its decision-making process based on its parameters and features—explainability focuses more on how humans can comprehend why a model generates a specific answer for a given input (Linardatos et al., 2020).
Ablation studies of model architectures, such as analyzing the effects of removing parts or even single neurons, help researchers understand which layers and substructures of language models respond to specific grammatical structures or language tasks. This can aid in designing smaller, yet competitive models to reduce computational costs and provide a basis for uncovering the deep learning black box.
Sometimes, human users and decision-makers often prefer AI themselves to provide reasonable explanations for its predictions and generated results, rather than delving into neuron connectivity and activation. Consequently, psychological studies have focused on how large language models can improve accuracy and correctness by providing "reasoning processes" for their responses.
Training Robust and Responsible AI Systems
The primary aspect of enhancing the authenticity of generated content lies in training more robust models. In the pre-training phase, the selection and use of data is crucial for model performance. For large language models (LLMs), alignment training in fine-tuning stages is a key step in making the models comply with human commands, transferring capabilities to various tasks, and outputting responsible content.
The Role of Synthetic Data and UNU’s Recommendations on It
While the performance of AI models relies heavily on training data, the scarcity of data on the Internet and the inadequate growth rate of the data volume hinder high-quality training of large models. Synthetic data, generated to mimic real-world data, plays a pivotal role in addressing data scarcity, especially in the Global South. It enhances privacy and reduces bias, allowing for the development of AI models that are both inclusive and compliant with data protection regulations. A recent UNU Policy Guideline: Recommendations on the Use of Synthetic Data to Train AI Models was launched during the UNU Macau AI Conference 2024. This guideline emphasizes the enormous potential of using synthetic data for AI training and highlights key risks such as data quality, cybersecurity, misuse, bias propagation, IP infringement, data pollution, and data contamination. The recommendations provide both technical and policy-specific guidance for the responsible use of synthetic data, including using diverse data sources and synthesis methods, disclosing synthetic data, respecting intellectual property, and revealing quality metrics.
Alignment with UN Values
During the fine-tuning stages of training AI models, we must also consider the alignment of AI with core ethical principles. AI alignment refers to the process of ensuring that artificial intelligence systems' goals, behaviors, and decisions are consistent with human values and intentions. This concept is crucial because as AI systems become more powerful and autonomous, they may exhibit behaviors that are not aligned with human expectations, potentially causing harm. When discussing AI alignment, a general consensus is to align AI with human values. Moving beyond that, as a part of the UN family, we advocate for the alignment of AI with UN values—rooted in the UN Charter and the Universal Declaration of Human Rights, emphasizing peace, justice, equality, human rights, dignity, and sustainability. Techniques for fine-tuning large model alignments, such as Reinforcement Learning with Human Feedback (RLHF), can be adapted to incorporate these values. By training AI systems using feedback from diverse and representative human groups, we can ensure that AI not only aligns with general human values but also upholds the specific values championed by the UN.
Enhancing Accuracy and Reliability of AIGC
Today, GenAI models, particularly LLMs and large vision-language models (LVLMs), also confront a critical integrity issue known as ‘hallucination’. This phenomenon refers to models producing information that appears plausible but is not grounded in reality, thereby compromising the accuracy and reliability of the generated content. Recent studies identified several causes of hallucination:
1. Inadequate Model Design: such as trained models' over-reliance on certain attentions (mechanisms in Transformer models that allow the model to focus on different parts of the input sequence) from premature layers during decoding (Huang et al., 2023).
2. Biased Training Data: Models trained on biased data are prone to generating hallucinations due to over-dependence on attestation bias, word frequency statistics, or even recalling incorrect answers (McKenna et al., 2023).
3. Imperfect Training and Alignment Processes: Statistical analyses have shown that pre-trained language models have an inherent lower bound on hallucination probability, independent of architecture and training data, making hallucinations unavoidable to some extent (Kalai et al., 2023).
Given the significant challenges of hallucination issues, global developers can jointly build frameworks that contain effective solutions to mitigate these issues and enhance the accuracy and reliability of GenAI. This involves both technical approaches such as retrieval-augmented generation, traceability mechanisms, fact-checking systems, and socio-technical measures such as designing governance frameworks and ethical guidelines.
Retrieval-Augmented Generation (RAG) with Reliable Information Sources
In order to alleviate the hallucination problem of LLMs and to enhance the accuracy and credibility of model outputs, it is intuitively effective to equip model writings with explicit “references” as in a rigorous academic paper. Retrieval-Augmented Generation (RAG) is one such technique we recommend that empowers LLMs to retrieve external knowledge bases before answering a question and attach citations to the generated content, thus reducing the hallucinated output (because of the provided additional and “correct” context), while allowing information and data sources to be traceable and inspectable. RAG has already been widely used in some AI-powered search engines such as Microsoft Copilot and Perplexity.
The reliability of RAG heavily depends on the quality of the retrieved data. To avoid introducing false information, it is crucial to responsibly select, verify, and integrate external sources. One of our innovative projects, named “FactifAI,” embodies our commitment to information integrity and aims to enhance media and information literacy from the perspective of content generation. FactifAI addresses the production and dissemination of dis- and misinformation, clarifies scientific facts in news events, and promotes the public delivery of neutral and scientific content. Building a reliable database for FactifAI involves sourcing from official UN documents, academic papers, and other manually screened and validated sources. By leveraging LLMs, multimodal models, and generative tools for images, videos, data visualization, and knowledge graphs, along with the information-gathering capabilities of RAG, we can generate comprehensible reports, explanations, timelines, and multimedia content based on trending news and user queries. This system also supports chatbot-like interactions for in-depth exploration and offers editorial customization tools for media practitioners. Ultimately, this ensures that the provided information is accurate, impartial, and scientifically grounded, maintaining a high standard of information integrity.
Traceability Through Watermarking and Blockchain
Another crucial aspect of information integrity for AI-generated content is traceability, ensuring the origin and authenticity of content can be verified. Watermarking and traceable record systems like blockchain technology are valuable tools in this regard. Watermarking involves embedding a unique, often invisible, identifier into digital content, while blockchain offers a decentralized and immutable ledger to verify the authenticity of information. For AI-generated content, this means that information produced by an AI system can carry a watermark linking back to its source. By recording the provenance of data and its subsequent modifications, blockchain or other distributed ledger systems also provide a tamper-proof and transparent trail that enhances information integrity.
Fact-checking and Verification Systems
Fact-checking and verification mechanisms are equally crucial in ensuring information integrity. Integrating real-time fact-checking within AI systems can significantly reduce the spread of false information. LLMs themselves have such potential to empower fact-checkers by offering a streamlined workflow of automated claim matching (Choi et al., 2024). By cross-referencing their responses with verified databases, LLMs can reduce the likelihood of disseminating false information. Fact-checking systems can also operate in real-time to monitor and verify the outputs of LLM-based agents, ensuring that the generated content adheres to factual standards before being presented to users.
At UNU Macau, we are exploring the potential of LLMs-based agents. By integrating safety-oriented agentic workflows, LLMs-based systems can have safety guidelines, more robust and safer planning, and more trustworthy outputs and evaluations (Hua et al., 2024). Given that hallucinations in generative AI models are unavoidable to some extent, and studies on interpretability and explainability are still ongoing works, human oversight and determination, as highlighted by the UNESCO’s recommendation, become essential. While the emergent capabilities of GenAI have empowered human decision-makers, when it comes to final decisions, especially those involving life and death, the ultimate responsibility needs to fall on humans.
Fostering AI-Enhanced Information Integrity for a Sustainable Future
The integration of AI in information integrity and the alignment with UN values offers a promising pathway to a more truthful and inclusive digital environment. By fostering robust AI models that are transparent, accountable, and ethically aligned with human rights, we can combat the pervasive threats of disinformation and misinformation. This journey demands global cooperation, innovative and interdisciplinary research, and a steadfast commitment to the SDGs. As we move forward, the combined efforts of international organizations, academia, and industry can harness AI's potential through establishing international standards, implementing robust regulatory frameworks, and supporting ethical AI development, to enhance information integrity, thereby supporting a future where digital platforms contribute positively to global good.
Suggested citation: Chu Chu, LIU Jia An., "Incorporating the UN Values: Artificial Intelligence and Information Integrity for the SDGs," UNU Macau (blog), 2024-09-10, 2024, https://unu.edu/macau/blog-post/incorporating-un-values-artificial-intelligence-and-information-integrity-sdgs.