News

From Tool to Teammate: Opening the Aperture on Human-Robot Teaming

Imagine a world where medics and soldiers are partnered with robots that can not only assist with complex tasks — such as transporting casualties to safety or maneuvering quickly through cities or rough terrain — but can also advise on problems and adapt to new information without human intervention.

To achieve this future, robots need a series of complex skills. Those skills include the ability to understand natural language and their surrounding environment in real time, to create and execute plans, and to evaluate their progress and replan as needed.

Researchers at the Johns Hopkins Applied Physics Laboratory (APL) in Laurel, Maryland, are using generative artificial intelligence (GenAI) and cutting-edge scene-mapping technology to elevate robots from simple tools to full teammates capable of providing aid in disaster and battlefield scenarios. The team’s work is funded by the Army Research Laboratory through the Army Artificial Intelligence Innovation Institute (A2I2) and through APL’s Independent Research and Development (IRAD) program.

“This research takes advantage of cutting-edge AI technology for a significant step forward in robotics,” said Chris Korpela, the Robotics and Autonomy program manager within APL’s Research and Exploratory Development Mission Area. “Fully autonomous robots will provide new and exciting capabilities in austere environments.”

The Current State of Robotics

Today, it’s possible to buy fairly advanced robots on the open market — although they cost about the same as a luxury car. Out of the box, these robots do not operate on their own and must receive commands via a controller. There is no option to control them using spoken language, as is possible with tasks on cell phones and tablets. Humans must perform the basic tasks of understanding the robots’ surroundings, creating an execution plan for the tasks the robot will perform, evaluating progress and replanning as needed — severely limiting the level to which people can meaningfully team with a robot.

“While they’re useful for many scenarios, current robots are closer to a remote-controlled car than an autonomous vehicle,” said Corban Rivera, a senior AI researcher at APL and principal investigator for this research. “They can be used as a great tool to enable certain operations, but humans can’t take their hands off the wheel, so to speak.”

Opening Robotic Eyes

An international research team with members from APL, Johns Hopkins University, University of Toronto, Université de Montréal, Massachusetts Institute of Technology, Army Research Laboratory and University of Massachusetts Amherst created a technology that enables a more beneficial partnership to enhance robots’ perception and understanding of their surrounding environment. This technology — ConceptGraphs — enables robots to have a near-human understanding of a 3D environment.

Using the technology, robots create 3D scene graphs that compactly and efficiently represent an environment. Through training on image-caption pairs from large language models (LLMs) and large visual language models, objects in the scene are assigned tags. These tags help robots understand the meanings and uses of objects as well as the relationships between them.

“Many robots in commercial industry are created to work in factories or distribution centers, which are pristine and predictable environments,” Korpela said. “There are very different needs when robots walk through the woods, for example, where there are numerous and unpredictable obstacles in the way, from rocks on the ground to trees in their path.”

ConceptGraphs is open-vocabulary — meaning it is not limited to the language in its training set — which enables humans to give robots instructions in plain language, either in text or voice, rather than through fixed commands. Robots can even support multimodal queries, which combine an image and a question or instruction. For example, when given an image of Michael Jordan and asked to find “something he would play with,” a robot was able to identify and find a basketball in the environment because of its training on image-caption pairs that provide context to images and objects.

“Now, not only can the robot build up a semantic description of the world, but you can query it in natural language,” said Dave Handelman, a senior roboticist at APL and a collaborator on the project. “You don’t have to ask it if it sees a car — you can say, ‘Show me everything with four wheels,’ or ‘Show me everything that can carry me places.’”

In a real-world scenario, this might translate to a medic asking a robot to locate casualties on a battlefield and transport them to safety until the medic can attend to them. The robot would be able to not only identify casualties but also determine what “safety” means and how to achieve it.

While ConceptGraphs resolved several challenges in human-robot teaming, significant obstacles still remained. Scanning and developing an understanding of the environment took a robot several minutes, and as the robot moved through the environment, more time was needed for additional scanning.

Related Work