You’ve probably read about the risks from adversarial trickery that cause AI to misbehave.
As AI becomes increasingly pervasive in day-to-day lives, from automated diagnosis of medical images through to vision processing in self-driving cars, the risk of this kind of trickery may appear scary. But how concerned should we be about this new cyber threat? The answer depends on several factors.
The AI at risk are deep learned algorithms (models), typically used to extract meaning from complex data such as images. Adversarial inputs (often referred to as ‘adversarial examples’) exploit the fact that deep learned models can’t cater for all possible inputs. Deep learning generalises characteristics of the data used to train it, so when it encounters input that’s in some way uncharacteristic of that data, unexpected things can happen. Usually, assuming the AI has been trained well for its purpose, the data presented to it is similar enough for the model to produce a sensible result. However, if someone deliberately fabricates data to confuse the AI, the chances of unintended outcomes increase. Unfortunately, attempts to find a method to reliably detect adversarial inputs have, so far, proven allusive.
Adversarial inputs are often discussed in the context of evasion, such as adversarial glasses to avoid facial recognition. However, there are other attack motivations, such as using adversarial data to cause confusion, or to discredit an organisation. For example, innocent online images that contained adversarial content could be misclassified as ‘prohibited’ by a social media content filter. Unlike the evasion attack, this attack doesn’t need every adversarial example to work. If enough adversarial data was proliferated through social media sharing, a successful misclassification of a small proportion of the images may be sufficient to cause disruption. So, although the motivation for attacks may be evasion, we should also be aware of other reasons for fooling AI.
Often ‘an AI’ is discussed as if it were a single entity, but machine learned models are typically just components of a larger solution. For example, AI in an autonomous car might take in visual data to maintain its course and avoid obstacles. However, as with any software component in a complex, safety-critical system, if there is a risk of error or trickery, we might expect sanity checks (such as proximity sensors to corroborate visual information) and fail safes (such as emergency driver override). Autonomous vehicles of the future will undoubtedly incorporate more intelligent infrastructure to further reduce environmental uncertainty. We already have centralised real-time data about road networks, traffic conditions, and weather; perhaps this will be augmented with local information from the road infrastructure itself with inter-vehicle communications as standard. It is the complete system that must be considered in the context of cyber vulnerabilities, not the AI component in isolation.
Although AI may be fooled, there may be no risk if its input data is trusted. Consider autonomous robots in a warehouse with visual sensors to navigate and select goods for delivery. It may be possible to fool a robot by tampering with the warehouse environment, but those people with access are likely to be trusted staff. Similarly, digital medical images in a hospital should be accessed only by trusted professionals.
Adversarial examples usually rely on a discrepancy between how humans and AI perceive information. With the incorporation of AI components into real world systems comes increased appreciation of the importance of explainability and realisation that in many circumstances models must return results based on criteria that we, as humans, would expect, and can justify.