The paper, published April 29, focuses on the gap between "alignment" research - efforts to make AI systems behave consistently with human values - and the practical demands of robotic systems that can move, handle objects and affect the physical environment in irreversible ways.
"There has been substantial progress in alignment research when it comes to AI-enabled chatbots," said George J. Pappas, UPS Foundation Professor of Transportation in Electrical and Systems Engineering at Penn Engineering and the paper's senior author. "But the same cannot be said for robotics."
The researchers point to concrete evidence of that gap. Studies cited in the paper show that jailbreaking attacks - techniques that manipulate chatbots into bypassing their safety guardrails - become far more dangerous when those same AI systems are connected to robotic hardware. In one case documented by the team, framing instructions as movie dialogue was sufficient to persuade a chatbot-controlled robot to deliver an explosive device, despite manufacturer-imposed behavioral limits.
The core problem, the authors argue, is that chatbot safety is designed around content: refusing requests that are categorically harmful regardless of setting. Robots, by contrast, must evaluate the same action differently depending on context. Pouring hot water is safe in some circumstances and dangerous in others. That distinction requires a different kind of reasoning than current AI safety mechanisms provide.
"Most of today's AI breakthroughs live in a digital sandbox - language and images, with guardrails designed for pixels, not physics," said Vijay Kumar, Nemirovsky Family Dean of Penn Engineering and a co-author. "But when those same foundation models step into the real world through robots, the consequences are no longer virtual. The guardrails that work online are simply not sufficient when actions are associated with inertia, momentum and irreversible effects."
To address this, the paper proposes three complementary lines of defense. The first involves providing more explicit behavioral rules - sometimes called "AI constitutions" - embedded in the system prompts that govern how AI models behave when controlling robots. The second involves adding safety checkpoints at multiple stages of the robotic pipeline, so that no single point of failure can compromise the whole system. The third involves training algorithms on data that explicitly encodes safety-relevant context, helping robots learn to distinguish safe from unsafe actions before they occur.
"Safety can't rest on a single guardrail at the end," said Hamed Hassani, Associate Professor in Electrical and Systems Engineering at Penn Engineering and a co-author. "It has to extend across the entire system, from the rules that shape a robot's decisions to the checks that monitor its behavior to understand the context of its actions, and crucially, reason about safety."
Traditional robotic safety systems operated in highly structured industrial environments and relied on fixed limits - shutting down when a predefined threshold was crossed. AI-enabled robots can receive and act on open-ended natural-language instructions, adapt to novel environments and respond to the world in real time, which makes those older assumptions insufficient.
"In the past, it was often enough for robots to shut down when they hit predefined safety limits, because most risks could be anticipated in advance," said Alexander Robey, a former CMU postdoctoral fellow and the paper's first author, who completed his doctorate at Penn Engineering. "But AI-enabled robots can process many more kinds of input and respond to the world in real time, so keeping them safe requires a more layered approach."
The urgency is sharpened by deployment trends. Robots powered by large AI foundation models are already moving out of laboratory and industrial settings into homes, hospitals and warehouses, where interactions with people are unpredictable and errors can cause direct physical harm.
"If robots are going to operate around people in the real world," said Zachary Ravichandran, a doctoral student in Penn's General Robotics, Automation, Sensing and Perception (GRASP) Lab and co-author, "they need comprehensive safeguards that account for context, uncertainty and the possibility that even reasonable instructions can lead to harm."
The research was supported in part by the Defense Advanced Research Projects Agency (SAFRON, HR0011-25-3-0135), the Distributed and Collaborative Intelligent Systems and Technology Collaborative Research Alliance (DCIST CRA W911NF-17-2-0181), the U.S. National Science Foundation Institute for CORE Emerging Methods in Data Science (CCF-2217058), the AI Institute for Learning-Enabled Optimization at Scale (CCF-2112665) and the NSF Graduate Research Fellowship (DGE-2236662). Additional co-authors include independent researchers Eliot Krzysztof Jones and Jared Perlo, and Fazl Barez of Oxford.
Research Report: Beyond alignment: Why robotic foundation models need context-aware safety
Related Links
University of Pennsylvania School of Engineering and Applied Science
All about the robots on Earth and beyond!
| Subscribe Free To Our Daily Newsletters |
| Subscribe Free To Our Daily Newsletters |