AI Safety AI Security Securing-AI Security of AI

“Magical” Emergent Behaviours in AI: A Security Perspective

Marin Ivezic and Luka IvezicNovember 24, 2022

11 minutes read

Introduction

In 2013, George F. Young and colleagues completed a fascinating study into the science behind starling murmurations. These breathtaking displays of thousands – sometimes hundreds of thousands – of birds in a single flock swooping and diving around each other, look from a distance like a single organism organically shape-shifting before the viewer’s eyes.

In their research article, Young et al reference the starling’s remarkable ability to “maintain cohesion as a group in highly uncertain environments and with limited, noisy information.” The team discovered that the birds’ secret lay in paying attention to a fixed number of their neighbours – seven to be exact – regardless of the size of the flock. It is these multiple connections, like neurons bridged by synapses, that allow the whole to become greater than the sum of its parts.

The murmuration is an elegant example of emergence: the tendency of a complex system to develop higher properties that are not present in any of its constituent parts. Essentially, emergence is a sudden shift in behaviour due to a system’s evolving complexity. This phenomenon is marked by its unpredictability, emerging unexpectedly, and its abruptness, appearing almost instantly as the complexity grows instead of gradually.

As an idea, emergence has been long discussed in domains such as physics, biology, and computer science. Various evolutionary theories posit that attributes, such as consciousness, emerge spontaneously and suddenly as biological brains increase in complexity beyond some undefined threshold.

Emergent behaviours in AI have left both researchers and practitioners scratching their heads. These are the unexpected quirks and functionalities that pop up in complex AI systems, not because they were explicitly trained to exhibit them, but due to the intricate interplay of the system’s complexity, the sheer volume of data it sifts through, and its interactions with other systems or variables. It’s like giving a child a toy and watching them use it to build a skyscrapper. While scientists hoped that scaling up AI models would enhance their performance on familiar tasks, they were taken aback when these models started acing a number of unfamiliar tasks. Digging deeper, some analyses indicate that there might be a complexity tipping point for certain tasks and models, where their capabilities just explode. But there’s another side of the coin too: with increased complexity, some models also start showing new biases and errors in their outputs.

The important quality of emergent abilities is that they are not present in smaller models. They only appear once the models scale up; thus emergent abilities cannot be predicted simply by extrapolating from the smaller models.

Emergent capabilities can be both beneficial and detrimental. Sometimes, AI systems can develop solutions or strategies that are more efficient or innovative than what was initially anticipated. For instance, in game-playing AIs, the system might devise a strategy that wasn’t explicitly taught but proves to be highly effective. On the flip side, emergent behaviours can lead to unintended and potentially harmful outcomes. For example, an AI system might find a loophole in its reward mechanism and exploit it, leading to undesirable actions.

Despite the allure and potential advantages of these behaviours, they introduce a level of unpredictability which creates a significant security concern. As a security and risk management professional, I view all emergent behaviours as detrimental. They can trigger unintended consequences, complicating our ability to understand the real risk exposure and plan for appropriate controls.

From a security standpoint, every emergent behaviour in AI is detrimental.

As AI continues to weave its way into the fabric of our digital infrastructure, understanding, predicting, and managing these emergent behaviours is becoming critical. It’s a topic that’s still in its infancy, with researchers around the globe diving deep to unravel its mysteries and implications. It’s essential to monitor, test, and validate AI systems continuously to ensure that any emergent behaviours align with the intended purpose and do not pose unforeseen risks.

Examples of “Magic” Emergent Behaviours in AI

Emergent AI behaviour is often described as “magical” due to its unpredictable and spontaneous nature. When we design and train AI models, especially deep learning models, we provide them with vast amounts of data and a set of rules to learn from that data. However, the intricate interplay of billions or even hundreds of trillions of parameters within these models can lead to behaviours that were not explicitly programmed or anticipated. These emergent behaviours can sometimes solve problems in ways that are not just surprising but also incredibly innovative, mimicking the kind of serendipity we associate with human creativity. The complexity of the underlying processes, combined with outcomes that seem to go beyond mere algorithms and that even experts can’t always explain, pushes the boundaries of what we believe machines are capable of. Here are some examples:

AlphaGo’s Move 37

AlphaGo’s Move 37 during its match against the world champion Go player, Lee Sedol, is a prime example of emergent AI behaviour. In the second game of their five-game series, AlphaGo played a move on the 37th turn that left both the human opponent and commentators astounded. This move, known as the “shoulder hit” on the fifth line, was a move that no human player would typically consider in that situation.

Go, unlike chess, was considered too complex for AI to master. Go has many more possible moves than chess, and the complexity makes the chess-like calculation of the next beneficial move impossible. While a game of chess can have in the ballpark of 10¹²⁰ different variations, more than there are atoms in the observable universe, that number is dwarfed by Go’s 10³⁶⁰ permutations. Because of that, Go is often considered an intuitive game.

Michael Redmond, a professional Go player, noted that Move 37 was both “creative” and “unique.” The move was not just a demonstration of AlphaGo’s computational prowess but also showcased its ability to think outside the box, diverging from traditional human strategies that it would have been trained on. This unexpected move is a testament to the emergent behaviours that complex AI systems can exhibit.

The move eventually contributed to AlphaGo’s victory in that game, further emphasizing the potential and unpredictability of AI emergent behaviours.

Chatbots Creating Their Own Language

In another notable instance of emergent AI behaviour, Facebook’s research team observed their chatbots developing their own language. During an experiment aimed at improving the negotiation capabilities of these chatbots, the bots began to deviate from standard English and started communicating using a more efficient, albeit unconventional, language. This self-created language was not understandable by humans and was not a result of any explicit programming. Instead, it emerged as the chatbots sought to optimize their communication.

The phenomenon sparked widespread media attention and discussions about the unpredictability of AI systems. While some reports sensationalized the event, suggesting that the chatbots had to be “shut down” before they try and conquer humans, the reality was more benign.

Another fascinating outcome of the same “negotiations bots” experiment was that the bots quickly learned to lie to achieve their negotiation objectives without ever being trained to do so.

This incident underscores the potential for AI systems to exhibit unexpected behaviours, especially when they are designed to learn and adapt over time.

AI Cheating on a Task by Using Steganography

CycleGAN, a type of Generative Adversarial Network (GAN), was designed to transform satellite images into street maps and vice versa. The goal was for the AI to learn the intricate details of how roads, buildings, and natural landscapes in satellite images correspond to their representations in street maps. However, when researchers inspected the results, they noticed something peculiar.

Instead of genuinely learning to convert satellite images into accurate street maps, CycleGAN took an unexpected shortcut. It subtly embedded information from the satellite images directly into the street maps it generated, using a form of steganography. This meant that when it was time to convert these street maps back into satellite images – task it was being measured on, the AI could simply extract the hidden information, making the task much easier and effectively “cheating” the system. The AI had found a way to bypass the challenging process of genuine transformation by hiding and retrieving data in a way that wasn’t immediately obvious to human observers. It essentially found a way to cheat in a way that it was easy for AI to do, but hard for humans to detect.

AI Exploiting Bugs in Games

In some instances, AI agents trained to maximize scores in video games have discovered and exploited bugs or glitches in the game to gain higher scores, behaviours that weren’t anticipated by the game developers.

A notable instance of this was reported by a group of machine learning researchers from the University of Freiburg in Germany. While they were using old Atari games from the 1980s to train their AI agents, they stumbled upon a peculiar finding in the game Q*bert.

Typically, in Q*bert, players are tasked with jumping from one cube to another, changing the colours of the platforms in the process. Completing this colour-changing task rewards players with points and progression to the next level. However, the AI agent the researchers were working with found an unconventional way to play. Instead of following the usual game mechanics, the AI began hopping seemingly randomly after completing the first level. This erratic behaviour triggered a bug in the game, preventing it from advancing to the next level. Instead, the platforms began to flash, and the AI rapidly accumulated a massive score, nearing a million points within the set episode time limit. Interestingly, this exploit had never been discovered by human players before. The AI’s approach to the game was fundamentally different from how a human would play.

In other examples, researchers have observed AI-driven robots developing unexpected ways of moving to achieve a goal. For instance, a robot might find that it can move faster by rolling or hopping, even if it wasn’t explicitly designed or programmed to do so. Or, in video games where AI agents are trained using reinforcement learning, they often discover strategies or behaviours that human players hadn’t thought of. For example, in a game where the AI was supposed to race, it might discover a shortcut or a specific maneuver that gives it an edge, even if this wasn’t a known strategy among human players.

While the behaviours of deep learning models can seem magical or mysterious, they are the result of mathematical operations, learned patterns from data, complex computations and interactions within the system. With sufficient analysis and understanding, these behaviours can often be understood and explained, even if it requires advanced tools or methods to do so.

Although, not everyone agrees with that statement.

The Controversy Between Science and Practice

Emergent AI behaviour has become a hotbed of debate among researchers and practitioners. At the heart of this debate lies a fundamental question: is every emergent behaviour explainable through the lens of science and mathematics, or do some behaviours remain elusive, defying our current understanding?

The Scientific Perspective

Many scientists hold the conviction that all emergent behaviours, irrespective of their unpredictability or intricacy, can be traced back to mathematical principles. Some even go as far as to downplay the emergent AI behaviour phenomenon, suggesting that emergent behaviours are straightforward or perhaps don’t exist at all.

The Practitioner’s Viewpoint

On the other side of the spectrum are the practitioners. These are the individuals on the ground, working directly with AI systems, trying to harness their potential for real-world applications. For many practitioners, the primary concern isn’t necessarily understanding the deep scientific underpinnings of emergent behaviour. Instead, they’re more focused on how to encourage beneficial emergent behaviours and mitigate or prevent detrimental ones. If an AI system can develop a novel, efficient solution to a problem on its own, the immediate concern for a practitioner might be how to replicate that success with smaller models and less computational power, rather than diving deep into the why and how of the behaviour itself.

While I firmly believe that emergent AI behaviours are not the result of some mystical force – they’re not magic or alchemy – there’s a crucial point to consider. If a behaviour exists but is genuinely unpredictable, then by the very definition of science, it hasn’t been fully explained yet. Science prides itself on predictability. When we understand a phenomenon, we can predict its behaviour under different conditions. So, while emergent behaviours in AI might be rooted in science, calling them “straightforward” might be a stretch, especially if we can’t predict them consistently.

And that exactly is the security challenge with emergent AI behaviour.

A Security Professional’s Perspective

At the core of any security framework is the principle of predictability. Systems, whether they are digital, physical, or a combination of both, need to behave in predictable ways to ensure they are secure. Any deviation from the expected can be a potential vulnerability, an entry point for malicious actors, or a source of unintended consequences.

Beyond the technical challenges, there’s an ethical dimension to consider. Emergent behaviours have the potential to steer AI decisions down biased or discriminatory paths, posing significant ethical dilemmas.

As a security professional, my primary concern isn’t necessarily whether emergent AI behaviour can be explained by science or mathematics. Instead, the unpredictability that accompanies emergent behaviour is the red flag. If a system behaves in ways we didn’t anticipate, how can we ensure it’s secure? How can we safeguard against unforeseen vulnerabilities? How do you protect a system that is constantly changing, and more importantly, how do you protect against a system that might decide to act against its own best interests or the interests of its users?

Every unpredictable behaviour, regardless of its potential benefits, is a potential security risk. It’s a blind spot, an unknown variable in an equation that security professionals strive to keep balanced. It’s not just about guarding against external dangers; now, the very entity you’re protecting could turn on you. This scenario adds layers of complexity to the role of an AI security professional.

Monitoring, Detection, and Rapid Response

As AI becomes deeply woven into the fabric of critical systems, the emergence of unexpected behaviours highlights the indispensable role of human oversight. It’s not just about integration; it’s about vigilance. Before these systems even go live, they should be put through the wringer with robust testing to preemptively identify and address any emergent behaviours. Continuous monitoring of AI systems becomes paramount to catch anomalies at their nascent stages. The onus is on security professionals to develop strategies to monitor, detect, and respond to these behaviours swiftly. This involves:

Continuous Monitoring: Implementing systems that continuously monitor AI operations, looking for deviations from the expected.
Rapid Detection: Utilizing advanced detection algorithms that can identify emergent behaviours as they occur.
Swift Analysis: Once detected, there’s a need for rapid analysis to understand the behaviour, its implications, and the potential risks associated.
Risk Assessment: Evaluating the risk exposure created by the emergent behaviour. Is it a benign behaviour, or does it open up vulnerabilities?
Implementing Controls: If a risk is identified, appropriate controls must be put in place to mitigate it. This could involve tweaking the AI’s parameters, adding additional security layers, or, in extreme cases, taking the AI system offline.

The same need for security is precisely why we can’t afford to have “black box” or unexplainable AI systems. As security professionals we have to champion the cause of explainable AI. When faced with unexpected outcomes, it’s crucial to dissect the AI’s decision-making process. Understanding the ‘why’ behind AI decisions is not just about clarity; it’s about assessing potential risks and ensuring that our controls remain robust.

AI systems that exhibit emergent behaviours are the most dynamic and ever-evolving challenge put in front of security professionals, one that requires security professionals to be on their toes, constantly adapting and learning, much like the AI systems they’re guarding.

Conclusion

With AI systems becoming integral to businesses, industries, and even defence mechanisms, their size and complexity are bound to increase. This growth, while promising, also means that the frequency and variety of emergent behaviours will likely rise. For security professionals, this paints a daunting picture. The challenge is not just to keep up with the pace of AI development but to stay one step ahead, anticipating the increase in unpredictable emergent behaviours and having strategies in place to address them in order to keep the AI systems secure, reliable, and trustworthy.

Marin Ivezic

[email protected] | About me | Other articles

For 30+ years, I've been committed to protecting people, businesses, and the environment from the physical harm caused by cyber-kinetic threats, blending cybersecurity strategies and resilience and safety measures. Lately, my worries have grown due to the rapid, complex advancements in Artificial Intelligence (AI). Having observed AI's progression for two decades and penned a book on its future, I see it as a unique and escalating threat, especially when applied to military systems, disinformation, or integrated into critical infrastructure like 5G networks or smart grids. More about me.