Artificial Intelligence (AI) is a growing trend in industry and consumer-grade devices. As more companies include some AI in their products – from smartphones to toys – AI also becomes more of a vector for abuse by criminals. The manipulation of AI decision processes is a particularly concerning threat. Many AI include learning systems that adapt to the input they receive. These AI can change their behavior over time by learning how to respond to new input. Something as simple as a smart lightbulb or a connected-gas meter could be trained by a malicious user to make the “wrong” decision. The wrong decision could result in the billing of a competing company to be extremely high, or the AI “deciding” not to run machinery in a safe way.
The challenge we face when investigating AI is knowing why an AI model made a decision. There are generally three ways in which an AI can learn; trained by the manufacturer, trained by the user, or trained by input data. The latter is potentially where an adversary could manipulate the AI to make incorrect responses.
When an incorrect response is detected, an investigator will need to know whether it is the fault of the manufacturer, user, or third party. They will need to understand how the AI “learned” the incorrect behavior.
The goal of this research is to find how an AI made a resulting incorrect decision, and what learning input – or combination of inputs – resulted in this incorrect decision.
This research was supported by the MISP (Ministry of Science & ICT), Korea, under the National Program for Excellence in SW) supervised by the IITP (Institute for Information & Communications Technology Promotion) grant number 2018-0-00216.