Data poisoning is a new type of cyber-attack aimed at misleading AI systems. AI is developed by processing huge amounts of data. The quality of data impacts the quality of AI. Data poisoning is the intentional supply of wrong or misleading data to impact the quality of AI. Data poisoning is becoming particularly risky with the development of Large Language Models (LLM) such as ChatGPT.
Researchers from the Swiss Federal Institute of Technology (ETH) in Zurich, Google, NVIDIA and Robust Intelligence have recently published a preprint paper investigating the feasibility of data poisoning attacks against machine learning (ML) models used in artificial intelligence (AI). They injected corrupted data into an existing training data set in order to influence the behaviour of an AI algorithm that is being trained on it. It impacted the functionality of AI systems.
As AI systems are becoming more complex and massive, the detection of data poisoning attacks will be difficult. The main risks are in dealing with politically charged topics.