Below the Threshold

Why I'm building Argus, a system that detects compromised robots in a simulated warehouse fleet.

Jun 07, 2026

Amazon’s fleet has crossed one million robots. The largest single fulfillment center runs roughly seven thousand of them in the same building, threading dynamic paths around each other and around human workers.1 That is not a network of laptops. It is a population of cyber-physical agents whose decisions translate, continuously, into kinetic motion across millions of square feet.

The standard approach for keeping that population safe was built for fleets a tenth of this size. Interval testing, random sampling, threshold alarms, and post-incident forensics all assume that a human operator can plausibly inspect the suspicious cases. That assumption breaks at a million. But even when the fleet is small enough to inspect, the most dangerous compromises are the ones engineered to stay below whatever threshold the alarm is calibrated to catch.

This is the first post in a long-form series about a portfolio project I am calling Argus. It is a system that detects compromised robots in a simulated warehouse fleet by combining an unsupervised detector with an AI auditing agent. This post explains why I picked this project, what it is, what it is not, and what I plan to do with the code while I build it.

The Failure Mode

The failure mode I am building Argus for is not a ransomware lock or denial-of-service. It is a robot that continues to do its job, while unknowingly or deceptively performing a different one. A planner that biases path selection through a sensitive area or shaves a few centimeters off a safety margin. A control loop with subtly tampered gains that introduce small oscillations under specific load conditions. A state estimator whose feedback has been injected with a bias that the controller dutifully tries to compensate for, throwing the robot off its intended trajectory in ways that aggregate over hours rather than seconds.

The technical name for one family of these attacks is False Data Injection. Recent work2 on resilient manipulator control has shown that an adversary can craft attacks that move entirely within a null space. This might be large internal joint reconfigurations that produce no change in the end-effector’s task space, and therefore no residual that a standard chi-square detector will see. There is also work showing that the feedback linearization used to make non-linear robots controllable also creates a structural “integrator vulnerability” that lets a properly designed injection steer the chassis without leaking residual information to the monitor.3 The same pattern shows up in path planning, where compromised routing degrades throughput by fractions of a percent or steers the robot past unauthorized zones without violating any safety constraint, a rule-based system was given to check.4

These are not theoretical. Researchers working on commercial systems have documented vulnerabilities in industrial robot controllers that let an unauthenticated attacker on a fleet network achieve remote code execution.5 Work on commercial quadruped robots has demonstrated wormable Bluetooth exploits that propagate root access laterally across units in minutes.6 The compromise vectors exist. What is missing is the auditing approach that can find a compromise after the perimeter has already been crossed and the attacker has chosen to stay quiet.

A Menu of Project Options

Before settling on Argus, I considered four projects in the robotics-and-cybersecurity space. I wanted to discuss them here because each is the seed of a separate strand of work, and the eventual project I chose to focus on combines many elements from each.

The first project was a ROS 2 vulnerability testbed: build a simulated warehouse running ROS 2 with realistic DDS-based communication, then systematically probe its attack surface. This is a natural extension of the warehouse cybersecurity research I have published before. The limitation with this project is that it does not move beyond cataloging. It produces a map of what is broken, but not a mechanism for finding compromises after the map is drawn.

The second project was an AI-powered anomaly detection system for fleets, essentially an observatory for telemetry. This would potentially produce a commercially viable product. It also pulls in time-series modeling, edge inference, and fleet-level reasoning, all of which are skills I want to deepen. However, pure anomaly detection at the fleet level is not a new concept, and entering that conversation as a new voice requires either a new technique or a new framing.

The third project was focused on adversarial attacks on the sensor-based ML models that physical AI systems depend on: computer vision for pick-and-place, IMU-based state estimation, and LiDAR-based mapping. The goal would be to demonstrate how each fails under adversarial input, then build a defense. This project would be publishable and build a bridge to the adversarial ML literature. It is also likely to live as a sequence of papers rather than as a single coherent system.

The fourth project was an open-source ROS 2 security toolkit. I would build a tool that scans a deployment, maps the communication graph, identifies unencrypted channels, and generates a security report. The toolkit would be useful, but it is really more of a tool than a project. It would work best as a complement to a project, rather than as the build itself.

The Project I Decided On

Argus combines elements from each of the other projects I considered. The testbed is a simulated warehouse running a heterogeneous fleet of robots with deliberately implanted compromises. The fleet-anomaly system is an unsupervised detector that learns the manifold of normal kinematic and telemetry behavior and scores each robot for deviation. The implanted compromises are designed adversarial behaviors, shaped to stay under standard detection thresholds.

The architectural pattern that ties these pieces together did not come from the robotics literature. It came from Anthropic’s recent work on Natural Language Autoencoders.7 The NLA tool itself — producing natural-language explanations of LLM activations — is a different problem from what I am working on. What I am borrowing is the auditor architecture the paper uses to evaluate the tool: deliberately implant misalignment in a population of models, equip an auditor with special access plus a set of probes, and measure how often the auditor catches the implant. In Anthropic’s benchmarks, agents equipped with the NLA-style auditor pattern surfaced hidden motivations in 12–15% of cases compared to less than 3% for unequipped baselines. This is a 4–5× improvement attributable to the structure of the auditor, not the underlying model.

Translated into the robotics domain, the same three roles map cleanly. The target system is a robot whose planner or control loop may be compromised. The interpretability layer is the unsupervised detector that scores deviations from nominal behavior across the fleet. The auditor is an agent that consumes detector scores, decides which robots to investigate, designs targeted physical probes (a specific corner-case trajectory, a kinematic perturbation, an injected sensor anomaly), and decides whether the results constitute compromise or benign noise. The detector is the fast filter that makes the system tractable at fleet scale. The auditor is the slow reasoner that does detailed testing. This pattern has precedent in the robotics literature for two-stage anomaly detection. For example, Sinha et al. (2024)8 propose a closely related fast-classifier/slow-reasoner architecture for general anomaly handling in robotic systems.

I am not focused on identifying vulnerabilities with any particular manufacturer’s systems. The fleet that I will start with is a simulated one. The robot types are generic abstractions: an AMR-style drive unit and a manipulator with a model-predictive controller. Any code that I release will use these generic abstractions rather than reverse-engineered proprietary controllers. If I do happen to find any real vulnerabilities with the tools I use, I will follow standard responsible-disclosure norms.

Argus is a portfolio project. It is my own build, developed independently. I don’t have a pitch deck or commercialization plan. I am going to work on the project in a private repository initially and release it publicly later. This will give the codebase time to mature before opening it up. Code snippets will appear here in the meantime. My goal is to learn and develop my skills. The purpose of these posts is to document that process.

Next up: a deep dive on the architecture. How the implant, the detector, and the auditing agent fit together. What I learned from the NLA paper. The experimental design.

Schmelzer, R. (2025, July 7). Amazon’s millionth warehouse robot is here, and it’s getting smarter. Forbes. https://www.forbes.com/sites/ronschmelzer/2025/07/07/amazons-millionth-warehouse-robot-is-here-and-its-getting-smarter/

Amazon. (2025). How Amazon’s robotics are reshaping our fulfillment network. https://www.aboutamazon.com/news/operations/amazon-robotics-robots-fulfillment-center

Larsson, C. (2025). False Data Injection Using Null Space. Mälardalen University. https://www.diva-portal.org/smash/get/diva2:1968676/FULLTEXT01.pdf

Gualandi, G., & Papadopoulos, A. V. (2026). From Passive Monitoring to Active Defence: Resilient Control of Manipulators Under Cyberattacks. arXiv. https://doi.org/10.48550/arxiv.2603.13003

Maggi, F. (2017). Rogue Robots: Testing the Limits of an Industrial Robot’s Security. Trend Micro Forward-Looking Threat Research. https://blackhat.com/docs/us-17/thursday/us-17-Quarta-Breaking-The-Laws-Of-Robotics-Attacking-Industrial-Robots-wp.pdf

Kovacs, E. (2026). Critical Vulnerability Exposes Industrial Robot Fleets to Hacking - SecurityWeek, accessed May 20, 2026, https://www.securityweek.com/critical-vulnerability-exposes-industrial-robot-fleets-to-hacking/

Insikt Group (2026). Hacking Embodied AI. Recorded Future. https://www.recordedfuture.com/research/hacking-embodied-ai

Fraser-Taliente et al. (2026). Natural language autoencoders produce unsupervised interpretability of LLM activations. Anthropic. https://transformer-circuits.pub/2026/nla/

Sinha, R., et al. (2024). Real-time anomaly detection and reactive planning with large language models. Robotics: Science and Systems (RSS) Proceedings. https://roboticsproceedings.org/rss20/p114.pdf

Kellen Betts

Discussion about this post

Ready for more?