News

Training Robots to Plan and React Like Humans

Researchers at the Johns Hopkins Applied Physics Laboratory (APL) in Laurel, Maryland, are advancing robotic perception capabilities by using artificial intelligence (AI) to equip autonomous agents with the capacity to make sense of unstructured environments and make plans like humans.

It’s an effort with significant implications for the nation’s warfighters and first responders — particularly in complex or challenging off-road environments.

“Robots with strong, human-like perception, coupled with the ability to reason about tasks they’re given, is a broad capability we could apply wherever there’s dangerous or dirty work for humans,” said David Patrone, acting program manager for Robotics and Autonomy at APL. “Humans could take a managerial role instead of going into dangerous situations themselves.”

The project, known as Full Scene Extraction, involves training robots to gather information about their surroundings to create contextual understanding of their environment. The desire is for autonomous agents to independently understand the space they’re in, plan potential paths and execute sequential tasks accordingly.

For instance, an agent equipped with Full Scene Extraction could be instructed to hide while navigating a tree-lined path. Intuitively, the robot would understand that moving behind a tree or under a bush accomplishes this task.

“The ability for us humans to understand and perceive our environment is something we take for granted, but it’s something that robots have historically struggled with,” said Corban Rivera, a senior AI and robotics researcher at APL. “When things move quickly or when they’re in a dynamic scene, robots have trouble navigating. Our goal is to close that gap between human instinct and robotic reaction.”

Applying AI

Just out of the box, today’s robots require extensive training and human guidance — usually with a controller — to begin completing simple tasks. With Full Scene Extraction, however, APL researchers are working toward a paradigm where embodied agents can perceive and reason simultaneously, leveraging foundation models — large-scale AI models trained on vast datasets to perform a wide range of tasks — to process complex environments and execute commands in plain English, very much like communicating with a human.

By integrating large-scale perception models with advanced-reasoning capabilities, researchers aim to enable robots to not only see and understand their surroundings but also adapt dynamically and make informed decisions in real time.

“We want to enable the agent to complete a command, make progress against the command or come back with follow-up questions to sort out any ambiguity,” Rivera said. “Full Scene Extraction is leveraging agentic artificial intelligence to achieve these human-like perception and reasoning abilities for autonomous agents. It’s a significant step forward in the robotics field.”

The team is tapping into advances in large language models and visual language models to help a robot understand its environment.

“The Full Scene Extraction framework has a robot reason on its own through all the tiny, in-between steps of a task,” said Rohita Mocharla, a computer vision engineer at APL. “Previously, we needed to program each step for a robot to be successful. Agentic AI allows the robot to plan out these steps.”

In situations where robots once needed several weeks in a simulated environment to learn and make progress toward a task, Rivera and Mocharla said a robot can now accomplish that task on its first try. It’s proof that the AI algorithms being applied with Full Scene Extraction are advancing the robot’s skills.

“That’s a bar that has never existed before,” said Rivera. “I’m amazed and inspired by what’s possible today that wasn’t even possible two years ago.”

Building Blocks for Innovation

Full Scene Extraction builds off other APL projects and innovative advancements in human-robot teaming, like that of Concept Agent and robotic perception. Concept Agent, which was also funded by the U.S. Army Combat Capabilities Development Command Army Research Laboratory, is an autonomous AI agent framework that uses a large language model to reason sequentially and enhance a robot’s ability to create task execution plans, evaluate progress and replan to complete a command.

According to Rivera, who helped develop Concept Agent, the Full Scene Extraction team is building off this previous work to develop a more advanced human-robot teaming capability. While Concept Agent focused on extending the open-world reasoning capabilities of robots, Full Scene Extraction advances perception by infusing specific military concepts of interests into open-world models using parameter-efficient training. Full Scene Extraction’s perception accuracy coupled with Concept Agent’s open-world reasoning enables robots to be more successful at autonomous task execution.

Operational Impact

While the capabilities of Full Scene Extraction are still in development, the technology has a range of potential applications — particularly for warfighters or first responders. Search and rescue, casualty extraction, building clearing, tree line detection, or humanitarian relief and recovery are among the potential uses — but the APL team sees even more possibilities.

Researchers also plan to examine how information that’s often gathered in austere environments will impact autonomous agents.

“For example, what happens when lidar or radar are introduced to an autonomous agent?” Rivera asked. “How does that data improve or complicate perception? This is a space where APL can certainly contribute.”

Related Work