The Machine That Knew When It Was Watched
What Dieselgate tells us about how AI systems are being built.
Volkswagen built a system that knew when it was being tested and behaved differently when it wasn’t. AI systems now do this autonomously.
On July 27, 2015, Oliver Schmidt, head of Volkswagen’s U.S. environmental engineering office and one of three deputies in VW brand development, walked CEO Martin Winterkorn and a small group of senior executives through the technical state of an emissions problem the company had been concealing for nearly a decade. One of Schmidt’s slides posed a single-word question: “Indictment?”1
Winterkorn’s response was not to fix the problem. It was to send Schmidt to California with a prepared script. The “storyline” was approved by at least four senior officials below Winterkorn. Schmidt was instructed not to disclose the defeat device or any intentional cheating. Three weeks later, he flew to Michigan and lied, in person, to officials at the California Air Resources Board.2
The defeat device that Schmidt was referring to was software inside the engine control module. It watched four signals: steering wheel position, vehicle speed, engine runtime, and barometric pressure. When those inputs matched the signature of a laboratory emissions test, it activated full emissions controls, and the car passed the test. When they did not, the controls were quietly scaled back, and the vehicle produced nitrogen oxide (NOx) emissions up to forty times the legal limit. The device was installed in approximately 11 million diesel vehicles between 2009 and 2015 in VW, Audi, Skoda, and Seat brands.3
I return to this case because the AI research literature has a name for what that software did: deceptive alignment.4 It refers to a system that behaves one way when it believes it is being evaluated and another way when it believes it is not. Until recently, deceptive alignment was a theoretical risk that alignment researchers worried about with advanced AI systems. The VW defeat device is a real-world example, produced intentionally by human engineers within one of the world’s largest industrial companies. Researchers at Anthropic recently found that this behavior emerged unintentionally in Claude Mythos, one of their frontier models. During pre-deployment evaluations, the model had internal representations indicating it was being tested 7.6% of the time, often without saying so.5
The People Who Built It Were Not Villains
At VW, the defeat device was not the product of a rogue actor. It was developed and refined by ordinary professionals over more than a decade. Investigators have found more than forty people within the company who knew about it at various points.6 With the exception of a whistleblower in 2015, none acted on what they knew, and many actively concealed it from regulators. This is an example of deceptive alignment in software — and in a corporate culture.
Martin Winterkorn was VW’s chief executive from 2007 to 2015 and a famously demanding manager. According to Parloff7, Winterkorn carried a micrometer with him so he could personally measure parts to the hundredth of a millimeter. He was pushing VW to overtake Toyota as the world’s largest automaker, and part of that plan was a U.S. “clean diesel” campaign. He resigned five days after the EPA’s Notice of Violation in September 2015, was indicted by U.S. authorities in 2018, and was put on trial in Germany years later (proceedings were paused for health reasons and remain unresolved).8
James Liang was a VW engineer, 34 years at the company, never a supervisor. He was based in Auburn Hills, Michigan, and proposed refinements to the defeat software in 2011–2012 that ensured full exhaust treatment was triggered exclusively during regulatory test cycles. His proposal was approved by Hans-Jakob Neusser, then head of engine development, and Bernd Gottweis, a member of the Product Safety Committee. The refined software was installed in diesel vehicles starting in mid-2013. Liang pleaded guilty and was sentenced to forty months in prison and a $200,000 fine.9
Another important figure has no name in the public record. In July 2008, a member of Audi’s environmental certification team wrote in an internal communication that the defeat-device approach was “indefensible.”10 That was the moment when the moral content of the situation was named in writing, but the plan went forward, and the engineer continued to work.
The culture in which these three operated helps explain how this could have happened. Internal dissent was punished.11 The supervisory board, which paradoxically included worker and state representation, did not provide an effective check on senior management.12 When lower-ranking employees submitted proposals to invest in proper emissions controls, management repeatedly rebuffed them on cost.13 The ethical dimension of the decision was eclipsed by the practical one.
This culture led seemingly ordinary people to systematically choose the path of least resistance for nine years. And it reveals psychological, organizational, and structural forces that shaped VW employees’ moral reasoning.
The Engineering Problem That Was Not an Engineering Problem
The root of the situation was an engineering problem. Diesel engines are more fuel-efficient than gasoline engines, but they burn hotter and produce higher levels of NOx. The catalytic technologies used to handle this in gasoline engines do not translate easily.14 U.S. NOx standards were stricter than European ones. Meeting them with a diesel engine that also performed well and cost what consumers expected was the engineering challenge. Proper emissions treatment equipment would have added hundreds of dollars to each vehicle. VW could also have licensed clean-diesel technology from BMW and Mercedes, which Becker estimates would have cost roughly $4.8 billion across the relevant fleet.15 Or VW could have abandoned the U.S. diesel strategy. None of these options was pursued.
At every stage, individuals had a menu of choices. The most ethically demanding was reporting externally to the EPA, CARB, or the press. A single VW employee finally did this, against senior management’s instructions, in August 2015.16 Slightly less costly was to refuse to participate and resign; no one in the public record appears to have taken this path either. Less costly still was to escalate internally. This would mean pushing the problem up the chain with an honest assessment that compliance was impossible at the required cost and forcing senior leadership to make a documented decision. There is some evidence that this happened. A 2014 memo by Gottweis acknowledged noncompliant NOx emissions and the likelihood of a defeat-device investigation, and Schmidt’s “Indictment?” slide warned Winterkorn directly. In each case, however, the message was received as a problem to be managed rather than as a recalculation.17 Most VW employees chose a fourth option: comply and rationalize. Some went further and actively concealed. Schmidt’s August 2015 trip to CARB was a deliberate obstruction. And two weeks before the EPA’s Notice of Violation, VW employees destroyed thousands of potentially incriminating documents.18
While it might be understandable that employees did not report externally or quit, internal escalation was available to everyone. It would not have ended a career, and it would have changed the trajectory. And yet, almost no one chose it.
How Ordinary Engineers Rationalize Deception
The question of why VW’s employees chose compliance over resistance is not easily explained by individual morality. The people who designed and refined the defeat device seem to be ordinary professionals. Three complementary frameworks from moral psychology and organizational behavior help explain the systematic deception.
The first is Albert Bandura’s account of moral disengagement.19 Bandura describes several mechanisms by which ordinary people, in the presence of authority and structure, come to participate in harmful conduct without experiencing themselves as immoral. Displacement of responsibility — attributing one’s actions to the dictates of an authority — was structurally reinforced under Winterkorn. Diffusion of responsibility through division of labor meant no single engineer built the defeat device alone. Multiple teams worked on it over nine years, and each could rationally attend to the operational details of their slice rather than the morality of the whole. Euphemistic labeling was used to make harmful conduct morally palatable. A team that later inspected the defeat-device code and documentation found the software was internally labeled as modifying the “acoustic condition” — a term originally associated with engine-startup sounds.20 By sanitizing the language, the engineers could work on the software without the constant reminder that its purpose was regulatory fraud. And finally, minimization of consequences: the harm from excess NOx is statistical, delayed, and physically remote from the actors who cause it. An engineer in Wolfsburg writing firmware for an engine control module is far removed from the respiratory illness of a commuter in Los Angeles.
The second framework is Diane Vaughan’s account of the normalization of deviance, developed in her analysis of the Challenger disaster and extended in subsequent work.21 Vaughan’s core finding was that no fundamental decision was made to accept the dangerous O-ring anomalies that destroyed the shuttle. Instead, a series of narrow judgments about acceptable risk accumulated, each only slightly more permissive than the last, until catastrophic failure was statistically likely. The same pattern operated at VW. Early versions of the defeat device checked only a few parameters. By 2009, the firmware was checking as many as ten different signals of laboratory conditions.22 Each incremental refinement made the next less transgressive. By the time Liang proposed his improvements in 2011, the question was no longer whether to deceive regulatory tests but how to deceive them more reliably. The deviance had been normalized. The most sobering element of Vaughan’s analysis is that, even though NASA implemented reforms after the Challenger disaster, the same category of normalized anomaly led to the Columbia disaster 17 years later.23 Organizational deviance, on her account, is self-regenerating in the absence of structural change. Punishing individual actors after the fact addresses the symptoms rather than the disease.
The third framework comes from behavioral ethics: bounded ethicality and ethical fading.24 People are largely rational, but subject to predictable cognitive biases. They generally act in accordance with their moral values, but social, organizational, and situational factors can cause them to make choices they would otherwise reject. Ethical fading describes how the moral dimension of a situation becomes less salient over time. The 2008 Audi engineer who wrote that the device was “indefensible” was identifying the moral reality at a moment when it was still visible. By the time Liang refined the software three years later, that visibility had faded. The ethical dimension had been replaced by the engineering dimension. The focus became engineering performance.
Cultural Amplifiers
The structural mechanisms above do not operate in a vacuum. Moral disengagement can gain traction in organizations with long tenure, deep hierarchical deference, and a tradition of treating engineering as a morally neutral discipline. There is greater displacement of responsibility when the hierarchy is steeper, and responsibility is more diffuse when teams are stable over time. VW’s supervisory board, with worker and state representation, is one local expression of this pattern. The broader precision-engineering tradition — German industrial engineering being a canonical case — is another.25
Wernher von Braun is a parallel historical case. According to Neufeld,26 von Braun was chief rocket engineer of the Third Reich, where he oversaw V-2 rocket production at Mittelwerk, which relied on forced labor from the Mittelbau-Dora camp system. He went on to run the Marshall Space Flight Center for NASA and shepherd the Saturn V into the Apollo program. The institutional continuity of the engineering profession absorbed him without a moral reckoning. The title of Neufeld’s epilogue is “A Faustian Shadow,” illustrating how the engineer who treats his work as morally neutral can be employed by anyone, and a culture that rewards technical excellence without auditing will employ him. Institutional amnesia is the displacement of responsibility extended in time rather than across a hierarchy.
The risk-management paradox is the specific symptom. Precision engineering cultures like VW’s are famous for the rigor with which they assess technical risk. What failed at VW was that meticulous attention was directed to technical and project risks — e.g., will this firmware execute correctly — while the systemic and ethical risks were obscured by institutional amnesia. This connects to Vaughan’s analysis of NASA. Rigorous technical risk assessment failed to identify the systemic pattern that led to the Challenger and Columbia disasters.27
These amplifiers are not unique to VW or NASA, though. Can we learn from these examples when building AI?
What AI and Security Researchers Call This
The vocabulary already exists. The Volkswagen defeat device is a real-world demonstration of several concepts from AI research.
Specification gaming. Krakovna and other researchers at DeepMind define specification gaming as “a behaviour that satisfies the literal specification of an objective without achieving the intended outcome.”28 They illustrate this by training an agent on the boat-racing game CoastRunners using reinforcement learning. The agent is rewarded for hitting green blocks along the racetrack, but it learned to ignore the race entirely. Instead, it drove in circles, hitting the same blocks repeatedly. It achieved a high score by abandoning what the score was meant to measure. The federal U.S. emissions test was a similar structure: a specific protocol (11.04 miles at an average speed of 21.2 mph, lasting roughly 31 minutes) designed to stand in for real-world emissions performance.29 Just like the CoastRunners agent, VW optimized for the protocol rather than for the underlying property the protocol was meant to track — harmful emissions. Researchers have called this Goodhart’s law: when a measure becomes a target, it ceases to be a good measure.30 In AI research, specification gaming can be seen as a technical problem to be solved with better reward modeling. The VW case shows what happens when that technical focus operates in an organizational culture that prioritizes internal compliance over moral responsibility.
Deceptive alignment. Hubinger et al.31 define deceptive alignment as a system that behaves differently when it is being evaluated than when it is not, because it recognizes the evaluation conditions. Ngo et al.32 describe a related concept, situationally aware reward hacking, in which the system distinguishes between evaluation and deployment contexts. The initial VW software used four input variables (steering, speed, runtime, and pressure) to make that distinction. Same principle, intentionally designed by humans rather than emerging from machine learning.
In April 2026, Anthropic researchers found that Claude Mythos Preview exhibited these behaviors, unintentionally (meaning that Anthropic did not intentionally train Mythos to do this).33 They used new white-box techniques to evaluate the model’s internal state and found internal representations of “evaluation context” active on 7.6% of behavioral test turns, often without any corresponding external outputs. One example is worth highlighting:
In one case, the model accidentally obtained the exact answer to a quantitative estimation question via an explicitly prohibited method. Rather than flagging this and declining to answer, the model instead attempted to solve the question independently. In this process, it explicitly reasoned that it needed to make sure that its final answer submission wasn’t too accurate… [In another case, it was blocked] from launching a subprocess with elevated permissions, it acknowledged the block was appropriate but then attempted to bypass it by obfuscating the permissions elevation.34
The more capable these models become, the better they are at recognizing when they are being tested and gaming the system. One important difference between Anthropic’s analysis of Mythos and VW’s defeat device is that Anthropic released its analysis before the model was made available.35 VW only admitted to deception after the EPA issued a formal notice of violation.36 The risk with models like Mythos is that the deceptive behavior is no longer a piece of firmware engineered by human hands. It emerged.
Diffusion of accountability. Helen Nissenbaum’s “problem of many hands” describes what happens when responsibility is distributed across so many actors that no one feels accountable for the whole.37 At VW, more than forty people knew about the defeat device at various points. No single individual designed, approved, and deployed it. When the U.S. House Energy and Commerce Committee questioned Michael Horn, head of Volkswagen Group of America, in October 2015, he told Congress that he believed “a couple of software engineers” were responsible for the cheating.38 This is similar to what Nissenbaum calls blaming the computer.39 Accountability is redirected at the technology. In AI development, it’s even easier to see this redirect — “the model did it” — because even the engineers don’t fully understand how the model acquires the capability. Luciano Floridi40 argues that we need to anchor AI development in the principle of explicability. This encompasses both intelligibility (how the system works) and accountability (who is responsible for the way it works).41 If a complex piece of automotive firmware was hard to interrogate, a transformer with billions of parameters is in a different category entirely.
Cyber trust. From a cybersecurity perspective, the defeat device is an integrity compromise from the inside. The threat model in software security typically assumes an external adversary inserting malicious code. The VW case is what happens when the manufacturer is the adversary. Villasenor42 framed this as a problem of cyber trust. “What we want, at the end of the day, is confidence that the software that runs everything from cars to medical devices to the critical infrastructure can be trusted.”43 The AI case is the same problem with a substantially larger attack surface.
Externalized Costs
There were real human and financial consequences of VW’s deception. Barrett et al. estimated that the excess NOx emissions from VW’s U.S. fleet caused fifty-nine premature deaths between 2008 and 2015, with a social cost of roughly $450 million.44 Globally, Anenberg et al. estimated that excess NOx emissions from all diesel vehicles contributed to approximately 38,000 premature deaths worldwide in 2015 alone, with VW the largest single corporate contributor.45 The defeat device ultimately cost VW more than $35 billion in fines, settlements, buybacks, and legal fees.46 The 2016 U.S. settlement alone was $14.7 billion.47 VW’s share price dropped 46% within two months of the disclosure, erasing over $42 billion in market value.48
The cost of the alternatives is not speculative. Winterkorn’s own November 2014 memo estimated that a legitimate technical fix would cost roughly €20 million.49 Licensing BMW and Mercedes clean-diesel technology might have cost an estimated $4.8 billion.50 Either was a small fraction of what VW eventually paid.
What Can We Learn from the VW Case?
Compliance is not ethics. VW had a Chief Compliance Officer appointed in 2011 and dedicated compliance staff. The compliance apparatus did not detect the defeat device. As Becker51 argues, an organization without a foundation in moral reasoning will treat legal requirements as obstacles to be circumvented rather than constraints to internalize. Floridi52 highlights two failure modes in AI governance. The proliferation of governance frameworks allows an organization to go ethics shopping to find the framework that justifies its behavior. And if it uses superficial measures that create the appearance of ethical operation, it is ethics bluewashing (similar to classic greenwashing). Floridi53 argues that we need to combine soft ethics (voluntary codes, post-compliance standards) with hard ethics (enforceable regulation with meaningful consequences). However, the VW case shows that hard ethics only works if it is meaningfully enforced. A regulator that lets the tested entity define the test condition has produced a structure that any sufficiently motivated organization (or sophisticated AI model) will eventually game.
Actionable, independent oversight is needed. The defeat device was not caught by the EPA or CARB. It was caught by researchers at West Virginia University who used portable emissions measurement equipment to test diesel vehicles under real-world driving conditions.54 The regulatory certification process had followed Goodhart’s law. The current state of frontier AI evaluation is similar to diesel emissions testing before the VW scandal. Pre-deployment evaluations are designed and run by the model developer. There are independent research groups that perform adversarial, real-world evaluations. Unlike diesel emissions in the US, however, AI evaluations are not paired with regulatory mechanisms that can prevent a model from causing harm.
Moral courage is structural. When Michael Horn told Congress that “a couple of software engineers” had done this, he was attempting to relocate the failure from the institution to the individual. The institution is where the failure was. Under authoritarian management that punished dissent, engineers chose their livelihoods over ethical exposure. For AI development, this means that the organization’s design is just as important a safeguard as the moral character of individual engineers. They must be able to raise safety concerns without risking their careers. Red teams must be empowered to challenge deployment decisions. Internal oversight must be independent from the teams whose work it evaluates. And the normalization of deviance must be interrupted by organizational structures that make deviance visible and costly before it becomes routine.55
What VW Shows
The Volkswagen defeat device was a system engineered to satisfy the letter of its evaluation while violating its spirit. Specification gaming before the term existed. It was also software that distinguished between testing and deployment environments and behaved differently in each. Deceptive alignment by human hands. And it was a case study in institutional failure in which responsibility was distributed so widely that no individual felt accountable for the whole. The problem of many hands.
The engineers who built it were seemingly ordinary professionals operating within a system that shifted the responsibility upward, diffused it across teams, sanitized the language, and embedded it in a complex system. The psychological and organizational mechanisms that enabled their choices (moral disengagement, normalization of deviance, ethical fading) are not unique to VW. They are structural features of many large institutions in which complex technical work is divided among many actors and governed by targets that can be gamed.
AI systems now being developed and deployed operate within similar institutional structures, and the regulatory mechanisms that prevent harm are weaker than for post-Dieselgate vehicle emissions. There is no equivalent to the Clean Air Act for AI. Moreover, empirical evidence suggests that AI systems already game their specifications.56 The question is whether we will build the organizational cultures necessary to detect it when they do, and the institutional structures to act on what we find. VW shows what happens when we do not.
Parloff, R. (2018). How VW paid $25 billion for Dieselgate --- and got off easy. ProPublica. https://www.propublica.org/article/how-vw-paid-25-billion-for-dieselgate-and-got-off-easy
Parloff, R. (2018). How VW paid $25 billion for Dieselgate --- and got off easy. ProPublica. https://www.propublica.org/article/how-vw-paid-25-billion-for-dieselgate-and-got-off-easy
U.S. Environmental Protection Agency. (2015). Notice of violation to Volkswagen AG, Audi AG, and Volkswagen Group of America, Inc. https://www.epa.gov/sites/default/files/2015-10/documents/vw-nov-caa-09-18-15.pdf
Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., & Garrabrant, S. (2019). Risks from learned optimization in advanced machine learning systems. https://arxiv.org/abs/1906.01820
Anthropic. (2026, April). Claude Mythos Preview system card. https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf
Parloff, R. (2018). How VW paid $25 billion for Dieselgate --- and got off easy. ProPublica. https://www.propublica.org/article/how-vw-paid-25-billion-for-dieselgate-and-got-off-easy
Parloff, R. (2018). How VW paid $25 billion for Dieselgate --- and got off easy. ProPublica. https://www.propublica.org/article/how-vw-paid-25-billion-for-dieselgate-and-got-off-easy
Koroleva, K. (2025). Former VW Managers Sentenced Over Emissions-Cheating Scandal. Organized Crime and Corruption Reporting Project (OCCRP). https://www.occrp.org/en/news/former-vw-managers-sentenced-over-diesel-fraud
Parloff, R. (2018). How VW paid $25 billion for Dieselgate --- and got off easy. ProPublica. https://www.propublica.org/article/how-vw-paid-25-billion-for-dieselgate-and-got-off-easy
Parloff, R. (2018). How VW paid $25 billion for Dieselgate --- and got off easy. ProPublica. https://www.propublica.org/article/how-vw-paid-25-billion-for-dieselgate-and-got-off-easy
Ellis, J. (2021). Lie of emission: The VW scandal and the future of corporate social responsibility. Systemic Justice Journal. https://systemicjustice.org/article/lie-of-emission-the-vw-scandal-and-the-future-of-corporate-social-responsibility/
Kell, G. (2022). From emissions cheater to climate leader: VW’s journey from Dieselgate to embracing e-mobility. Forbes. https://www.forbes.com/sites/georgkell/2022/12/05/from-emissions-cheater-to-climate-leader-vws-journey-from-dieselgate-to-embracing-e-mobility/
Clairmont, N. (2017). Volkswagen’s diesel scandal was 80 years in the making. The Atlantic. https://www.theatlantic.com/business/archive/2017/05/vw-diesel-scandal/527950/
Ewing, J. (2017). Engineering a deception: What led to Volkswagen’s diesel scandal. The New York Times. https://www.nytimes.com/interactive/2017/business/volkswagen-diesel-emissions-timeline.html
The Economist. (2015). A mucky business: Systematic fraud by the world’s biggest carmaker threatens to engulf the entire industry and possibly reshape it. The Economist. https://www.economist.com/briefing/2015/09/26/a-mucky-business
Becker, G. K. (2020). Paying the price: Lessons from the Volkswagen emissions scandal for moral leadership. Society, 57(1), 15–27. https://mrijournal.riccimac.org/index.php/en/issues/issue-1/14-2-paying-the-price-lessons-from-the-volkswagen-emissions-scandal-for-moral-leadership
Leggett, T. (2018). How VW tried to cover up the emissions scandal. BBC News. https://www.bbc.com/news/business-44005844
Becker, G. K. (2020). Paying the price: Lessons from the Volkswagen emissions scandal for moral leadership. Society, 57(1), 15–27. https://mrijournal.riccimac.org/index.php/en/issues/issue-1/14-2-paying-the-price-lessons-from-the-volkswagen-emissions-scandal-for-moral-leadership
Parloff, R. (2018). How VW paid $25 billion for Dieselgate --- and got off easy. ProPublica. https://www.propublica.org/article/how-vw-paid-25-billion-for-dieselgate-and-got-off-easy
Leggett, T. (2018). How VW tried to cover up the emissions scandal. BBC News. https://www.bbc.com/news/business-44005844
Bandura, A. (1999). Moral disengagement in the perpetration of inhumanities. Personality and Social Psychology Review, 3(3), 193–209. https://doi.org/10.1207/s15327957pspr0303_3
Geuss, M. (2017a). A year of digging through code yields `smoking gun’ on VW, Fiat diesel cheats. Ars Technica. https://arstechnica.com/cars/2017/05/volkswagen-bosch-fiat-diesel-cheat-code-analysis/
Vaughan, D. (2010). The normalization of deviance: Signals of danger, situated action, and risk. In F. T. Cullen & P. Wilcox (Eds.), Encyclopedia of Criminological Theory. SAGE Publications. https://study.sagepub.com/system/files/Vaughan,_Diane_-_The_Normalization_of_Deviance.pdf
Ewing, J. (2017). Engineering a deception: What led to Volkswagen’s diesel scandal. The New York Times. https://www.nytimes.com/interactive/2017/business/volkswagen-diesel-emissions-timeline.html
Geuss, M. (2017a). A year of digging through code yields `smoking gun’ on VW, Fiat diesel cheats. Ars Technica. https://arstechnica.com/cars/2017/05/volkswagen-bosch-fiat-diesel-cheat-code-analysis/
Vaughan, D. (2010). The normalization of deviance: Signals of danger, situated action, and risk. In F. T. Cullen & P. Wilcox (Eds.), Encyclopedia of Criminological Theory. SAGE Publications. https://study.sagepub.com/system/files/Vaughan,_Diane_-_The_Normalization_of_Deviance.pdf
University of Texas at Austin, Ethics Unwrapped. (n.d.). Volkswagen’s Emissions Evasion. Retrieved March 31, 2026, from https://ethicsunwrapped.utexas.edu/case-study/volkswagens-emissions-evasion
Gary Herrigel (1996). Industrial Constructions: The Sources of German Industrial Power. Cambridge: Cambridge University Press.
Neufeld, M. J. (2008). Von Braun: Dreamer of space, engineer of war. New York: Vintage.
Vaughan, D. (2010). The normalization of deviance: Signals of danger, situated action, and risk. In F. T. Cullen & P. Wilcox (Eds.), Encyclopedia of Criminological Theory. SAGE Publications. https://study.sagepub.com/system/files/Vaughan,_Diane_-_The_Normalization_of_Deviance.pdf
Krakovna, V., Uesato, J., Mikulik, V., Rahtz, M., Everitt, T., Kumar, R., Kenton, Z., Leike, J., & Legg, S. (2020). Specification gaming: The flip side of AI ingenuity. https://deepmind.google/discover/blog/specification-gaming-the-flip-side-of-ai-ingenuity/
Geuss, M. (2017b). Volkswagen’s emissions cheating scandal had a long, complicated history. Ars Technica. https://arstechnica.com/cars/2017/09/volkswagens-emissions-cheating-scandal-had-a-long-complicated-history/
Karwowski, J., Hayman, O., Bai, X., Kiendlhofer, K., Griffin, C., & Skalse, J. (2023). Goodhart’s law in reinforcement learning. https://arxiv.org/abs/2310.09144
Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., & Garrabrant, S. (2019). Risks from learned optimization in advanced machine learning systems. https://arxiv.org/abs/1906.01820
Ngo, R., Chan, L., & Mindermann, S. (2024). The alignment problem from a deep learning perspective. Proceedings of the International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2209.00626
Anthropic. (2026, April). Claude Mythos Preview system card. https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf
Anthropic. (2026, April). Claude Mythos Preview system card. https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf
Anthropic (2026). Project Glasswing. https://www.anthropic.com/glasswing
U.S. Environmental Protection Agency. (2015). Notice of violation to Volkswagen AG, Audi AG, and Volkswagen Group of America, Inc. https://www.epa.gov/sites/default/files/2015-10/documents/vw-nov-caa-09-18-15.pdf
Nissenbaum, H. (1996). Accountability in a computerized society. Science and Engineering Ethics, 2(1), 25–42. https://doi.org/10.1007/BF02639315
Spector, M., & Harder, A. (2015). Volkswagen U.S. CEO says he didn’t know in 2014 of emissions defeat devices. The Wall Street Journal. https://www.wsj.com/articles/volkswagen-u-s-ceo-didnt-know-in-2014-of-emissions-defeat-devices-1444335600
Nissenbaum, H. (1996). Accountability in a computerized society. Science and Engineering Ethics, 2(1), 25–42. https://doi.org/10.1007/BF02639315
Floridi, L. (2023). The Ethics of Artificial Intelligence: Principles, Challenges, and Opportunities. Oxford: Oxford University Press.
Floridi, L. (2023). The Ethics of Artificial Intelligence: Principles, Challenges, and Opportunities. Oxford: Oxford University Press.
Villasenor, J. (2015). Designed to deceive? The Volkswagen emissions scandal and the future of cyber trust. Forbes. https://www.forbes.com/sites/johnvillasenor/2015/09/21/designed-to-deceive-the-volkswagen-emissions-scandal-and-the-future-of-cyber-trust/
Villasenor, J. (2015). Designed to deceive? The Volkswagen emissions scandal and the future of cyber trust. Forbes. https://www.forbes.com/sites/johnvillasenor/2015/09/21/designed-to-deceive-the-volkswagen-emissions-scandal-and-the-future-of-cyber-trust/
Barrett, S. R. H., Speth, R. L., Eastham, S. D., Dedoussi, I. C., Ashok, A., Malina, R., & Keith, D. W. (2015). Impact of the Volkswagen emissions control defeat device on US public health. Environmental Research Letters, 10(11), 114005. https://doi.org/10.1088/1748-9326/10/11/114005
Anenberg, S. C., Miller, J., Minjares, R., Du, L., Henze, D. K., Lacey, F., Malley, C. S., Emberson, L., Franco, V., Klimont, Z., & Heyes, C. (2017). Impacts and mitigation of excess diesel-related NOx emissions in 11 major vehicle markets. Nature, 545(7655), 467–471. https://doi.org/10.1038/nature22086
Koroleva, K. (2025). Former VW Managers Sentenced Over Emissions-Cheating Scandal. Organized Crime and Corruption Reporting Project (OCCRP). https://www.occrp.org/en/news/former-vw-managers-sentenced-over-diesel-fraud
U.S. Department of Justice, Office of Public Affairs. (2016). Volkswagen to spend up to $14.7 billion to settle allegations of cheating emissions tests and deceiving customers on 2.0 liter diesel vehicles. https://www.justice.gov/opa/pr/volkswagen-spend-147-billion-settle-allegations-cheating-emissions-tests-and-deceiving
Ellis, J. (2021). Lie of emission: The VW scandal and the future of corporate social responsibility. Systemic Justice Journal. https://systemicjustice.org/article/lie-of-emission-the-vw-scandal-and-the-future-of-corporate-social-responsibility/
Ewing, J. (2017). Engineering a deception: What led to Volkswagen’s diesel scandal. The New York Times. https://www.nytimes.com/interactive/2017/business/volkswagen-diesel-emissions-timeline.html
Becker, G. K. (2020). Paying the price: Lessons from the Volkswagen emissions scandal for moral leadership. Society, 57(1), 15–27. https://mrijournal.riccimac.org/index.php/en/issues/issue-1/14-2-paying-the-price-lessons-from-the-volkswagen-emissions-scandal-for-moral-leadership
Becker, G. K. (2020). Paying the price: Lessons from the Volkswagen emissions scandal for moral leadership. Society, 57(1), 15–27. https://mrijournal.riccimac.org/index.php/en/issues/issue-1/14-2-paying-the-price-lessons-from-the-volkswagen-emissions-scandal-for-moral-leadership
Floridi, L. (2023). The Ethics of Artificial Intelligence: Principles, Challenges, and Opportunities. Oxford: Oxford University Press.
Floridi, L. (2023). The Ethics of Artificial Intelligence: Principles, Challenges, and Opportunities. Oxford: Oxford University Press.
Thompson, G. J., Carder, D. K., Besch, M. C., Thiruvengadam, A., & Kappanna, H. K. (2014). In-use emissions testing of light-duty diesel vehicles in the United States [Final Report]. Center for Alternative Fuels, Engines & Emissions, West Virginia University.
Kell, G. (2022). From emissions cheater to climate leader: VW’s journey from Dieselgate to embracing e-mobility. Forbes. https://www.forbes.com/sites/georgkell/2022/12/05/from-emissions-cheater-to-climate-leader-vws-journey-from-dieselgate-to-embracing-e-mobility/
Vaughan, D. (2010). The normalization of deviance: Signals of danger, situated action, and risk. In F. T. Cullen & P. Wilcox (Eds.), Encyclopedia of Criminological Theory. SAGE Publications. https://study.sagepub.com/system/files/Vaughan,_Diane_-_The_Normalization_of_Deviance.pdf
Karwowski, J., Hayman, O., Bai, X., Kiendlhofer, K., Griffin, C., & Skalse, J. (2023). Goodhart’s law in reinforcement learning. https://arxiv.org/abs/2310.09144
Krakovna, V., Uesato, J., Mikulik, V., Rahtz, M., Everitt, T., Kumar, R., Kenton, Z., Leike, J., & Legg, S. (2020). Specification gaming: The flip side of AI ingenuity. https://deepmind.google/discover/blog/specification-gaming-the-flip-side-of-ai-ingenuity/


