AI Security

The Dark Art of Model Stealing: What You Need to Know

Imagine pouring years of research, talent, and financial resources into building a state-of-the-art AI model, only to have it stolen in mere minutes. In today’s rapidly evolving technological landscape, AI models are the backbone of innovations that range from life-saving medical diagnoses to revolutionary customer service solutions. They are both highly valuable and exceptionally vulnerable. Model stealing is a type of a threat in which an adversary duplicates a machine learning model without direct access to its parameters or data.

The Basics of Model Stealing

Definition of Model Stealing

Model stealing, also known as model extraction, is the practice of reverse engineering a machine learning model owned by a third party without explicit authorization. Attackers don’t need direct access to the model’s parameters or training data to accomplish this. Instead, they often interact with the model via its API or any public interface, making queries (i.e., sending input data) and receiving predictions (i.e., output data). By systematically making numerous queries and studying the outputs, attackers can build a new model that closely approximates the target model’s behavior.

Types of Vulnerable AI Models

Black-box Models: Models like deep neural networks fall under the category of black-box algorithms, where the internal workings are complex and not easily interpretable. The allure of these advanced models makes them prime targets for attackers. Their complexity not only represents high value but also offers a challenging puzzle that many hackers are willing to solve. Such models are often employed in cutting-edge applications like natural language processing, self-driving vehicles, and high-frequency trading systems.

Publicly Accessible Models: Any machine learning model with a public-facing API or web interface is inherently more vulnerable to model stealing. Because these models are designed to be easily accessible, they offer attackers the opportunity to make queries and collect data without the need for internal network penetration. These models can range from natural language processing services to image recognition APIs, making them popular targets for theft.

Simple Models: Ironically, simpler machine learning models like linear regression, decision trees, or k-NN are also particularly vulnerable. Their straightforward algorithms make them easier to approximate, often requiring fewer queries for an attacker to reverse-engineer them successfully. These models find frequent use in simpler tasks such as basic classification problems, weather prediction, and recommendation systems, and their simplicity makes them quicker and less noticeable targets for theft.

Models Without Rate Limiting or Monitoring: Deployed machine learning models lacking robust security features like rate limiting and access pattern monitoring are sitting ducks for attackers. Without these security barriers, hackers can make a large number of queries in a short period, collecting the data they need to steal the model without raising any alarms. Unfortunately, these kinds of vulnerable deployments are often found in early-stage startups or projects where security awareness and resources are limited.

Real-world Examples Where Model Stealing Has Occurred

Natural Language Processing Models: One of the sectors that has been hit hard by model stealing is the fast-growing field of Natural Language Processing (NLP). Companies invest heavily in creating sophisticated chatbot technologies that can understand and respond to human queries with unprecedented accuracy. The theft of these models can be devastating. There have been instances where companies were shocked to find mirror-image chatbots on competitor platforms, offering nearly identical responses and conversation flows as their proprietary models. The theft undermines years of R&D investment and provides unauthorized players a shortcut to a competitive edge.

Game Playing Algorithms: Online gaming is another domain that often becomes a battleground for intellectual property theft. Game developers invest in building complex algorithms to make bot players behave more human-like in strategy and combat games. These algorithms can be stolen and deployed in unauthorized games, severely undermining the unique selling propositions of the original games. This not only erodes the competitive advantage of the original developers but also saturates the market with indistinguishable products, making it harder for consumers to make informed choices.

Healthcare Algorithms: In an industry as critical as healthcare, the theft of diagnostic algorithms has far-reaching consequences. Advanced machine learning models are used to interpret medical data, make predictions, and even suggest treatment options. There have been reported cases where such diagnostic algorithms were reverse-engineered and replicated, potentially risking not just intellectual property but also patient data. Such thefts could lead to misdiagnoses if the stolen models are not as rigorously validated as the originals, raising serious ethical and legal concerns.

Recommendation Systems: The retail and streaming industries heavily rely on their recommendation algorithms to personalize user experiences and maximize sales or engagement. These algorithms, often the result of years of iterative development and fine-tuning, have been found to replicate on competing platforms. Companies have discovered copycat algorithms that generate suspiciously similar product or content recommendations, thereby diluting their competitive edge and market uniqueness. The theft of such algorithms is particularly damaging because they are directly tied to customer engagement and revenue streams.

The Techniques Behind Model Stealing

Query-based Model Stealing: One of the most common methods employed for model stealing is the query-based technique. In this approach, attackers make a plethora of queries, sending various inputs to the target model and recording the corresponding outputs. By accumulating a large dataset of input-output pairs, they train a surrogate model to mimic the behavior of the target model. The surrogate can then be further refined to approximate the original model as closely as possible, effectively “stealing” its capabilities without ever having direct access to its parameters or training data.

Model Inversion: Model inversion is a more sophisticated technique whereby attackers use the output from the target model to reconstruct the input data that could have led to that particular output. By doing so repeatedly, they essentially reverse-engineer the model’s decision-making process, thereby gaining insights into its internal workings. This allows them to create a replicated model that behaves very similarly to the target model, even capable of generating the same or similar outputs for a given set of inputs.

Transfer Learning: Transfer learning techniques involve taking a pre-trained model and fine-tuning it to approximate the behavior of the target model. Attackers can begin with a similar, publicly available model as a starting point and fine-tune it using the query-based method. By doing this, they can drastically reduce the amount of time and data needed to approximate the target model, making it a highly effective method for model stealing.

Other Techniques: Aside from the major techniques outlined above, attackers also employ other methods like data augmentation and ensemble methods to improve the performance of their stolen models. Data augmentation involves artificially expanding the dataset using transformations, which helps in fine-tuning the stolen model. Ensemble methods combine the predictions from multiple stolen or approximated models to produce a more accurate output, essentially leveraging the strengths of each individual model to improve overall performance.

The Risks Involved

The implications of model stealing are multi-faceted and far-reaching. At the forefront is the loss of intellectual property, which includes not just proprietary algorithms but also invaluable data. This extends to financial consequences, as companies may suffer substantial revenue loss due to unauthorized use or resale of their pilfered models. Moreover, the clandestine nature of model stealing can severely damage a company’s reputation, undermining customer trust and eroding market share. Finally, the legal repercussions can be significant, involving costly lawsuits, penalties, and potentially irreparable damage to business relations.

Best Practices for Preventing Model Stealing

Protecting machine learning models from theft necessitates a multi-layered approach to security. One strategy is obfuscation, where the true complexity or finer details of the model are hidden, making it more challenging for attackers to reverse-engineer. Rate limiting is another effective technique, which involves setting restrictions on the number of queries that can be made to the model within a certain timeframe, thereby slowing down or disrupting a would-be attacker’s efforts. For added protection, data watermarking can be used, a technique where unique markers are embedded into the model’s data, allowing for easier tracking and identification in case of unauthorized replication. In terms of access management, rigorous access control policies should be in place to define who has the right to query or interact with the model and its associated data, providing an extra layer of security. Finally, none of these measures are set-and-forget solutions; regular security audits are imperative for assessing the effectiveness of the protective mechanisms in place and for identifying potential vulnerabilities that could be exploited.

Recent Research on Model Stealing

The academia and industry have been paying close attention to the phenomenon of model stealing, leading to a surge in research aimed at understanding its mechanics, vulnerabilities, and countermeasures. One influential study [1] laid foundational insights into how black-box models could be compromised through their public APIs. Another study [2] looks into the specifics of model inversion techniques, analysing how output data can be used to reconstruct sensitive input information. Research has also begun to look at the ethical and legal aspects [3], which touch upon the legal ramifications of data leakage and model theft. Meanwhile, the survey in [4] provides an exhaustive overview of adversarial challenges in machine learning, including model stealing, and discusses potential defense mechanisms. Lastly, [5] addresses various ways to robustly protect machine learning models against a myriad of attacks, including model stealing, by mathematically certifying their resistance.


The landscape of cybersecurity has expanded to include not just traditional systems but also sophisticated machine-learning models that are increasingly powering modern technology. Model stealing represents a massive threat to industries across the board, from healthcare and finance to retail and gaming. As our reliance on AI grows, so does the need for robust security measures to protect these valuable assets. From the intricate techniques employed by attackers to the myriad vulnerabilities that make models susceptible to theft, understanding model stealing is and protecting against it is critical. Best practices like obfuscation, rate limiting, data watermarking, and regular security audits offer some defense, but these are not foolproof. Recent research suggests that the cat-and-mouse game between attackers and defenders is alive and well, requiring ongoing vigilance and adaptation. When models are both the product and the vulnerability, the stakes couldn’t be higher.


  1. Tramèr, F., Zhang, F., Juels, A., Reiter, M. K., & Ristenpart, T. (2016). Stealing machine learning models via prediction {APIs}. In 25th USENIX security symposium (USENIX Security 16) (pp. 601-618).
  2. Fredrikson, M., Jha, S., & Ristenpart, T. (2015, October). Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security (pp. 1322-1333).
  3. Ohm, P. (2014). Sensitive information. S. Cal. L. Rev.88, 1125.
  4. Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A., & Mukhopadhyay, D. (2018). Adversarial attacks and defences: A survey. arXiv preprint arXiv:1810.00069.
  5. Raghunathan, A., Steinhardt, J., & Liang, P. (2018). Certified defenses against adversarial examples. arXiv preprint arXiv:1801.09344.

For 30+ years, I've been committed to protecting people, businesses, and the environment from the physical harm caused by cyber-kinetic threats, blending cybersecurity strategies and resilience and safety measures. Lately, my worries have grown due to the rapid, complex advancements in Artificial Intelligence (AI). Having observed AI's progression for two decades and penned a book on its future, I see it as a unique and escalating threat, especially when applied to military systems, disinformation, or integrated into critical infrastructure like 5G networks or smart grids. More about me.

Luka Ivezic
Luka Ivezic

Luka Ivezic is the Lead Cybersecurity Consultant for Europe at the Information Security Forum (ISF), a leading global, independent, and not-for-profit organisation dedicated to cybersecurity and risk management. Before joining ISF, Luka served as a cybersecurity consultant and manager at PwC and Deloitte. His journey in the field began as an independent researcher focused on cyber and geopolitical implications of emerging technologies such as AI, IoT, 5G. He co-authored with Marin the book "The Future of Leadership in the Age of AI". Luka holds a Master's degree from King's College London's Department of War Studies, where he specialized in the disinformation risks posed by AI.

Related Articles

Share via
Copy link
Powered by Social Snap