News

The Future of Pathogen Genomic Surveillance at Johns Hopkins APL

Thirty-six hours. That’s all it takes for a pathogen to travel from a remote village to major cities on all continents. And if there is one thing everyone on planet Earth can likely agree on, it’s that we never want to go through another pandemic again.

Unfortunately, increased global travel and trade, the development of antimicrobial resistance and increased animal-human interactions contribute to an increased risk that disease outbreaks can (and likely will) occur and spread rapidly.

To address that risk, researchers at the Johns Hopkins Applied Physics Laboratory (APL) in Laurel, Maryland, are leveraging more than three decades of public health and data analysis experience to outline strategies and build tools that will best enable effective decision-making for outbreak response.

Lessons Learned From COVID-19

“The COVID-19 pandemic showcased a clear need for quick and accurate data to make better decisions,” said Sheri Lewis, APL’s National Health Mission Area deputy executive. “So while we’re getting better at collecting this data, we still need to strengthen our ability to leverage it in ways that can best inform policymakers and health care providers, and ultimately better protect the health of individuals all over the planet.”

Varying medical infrastructures, patient confidentiality and other factors have created an extremely disjointed system that makes it difficult for researchers to connect and analyze health data points.

“Data at the individual level has impacts at the local level, and the local level has impacts at the regional level and so forth — leading all the way up to global implications,” said Alan Ravitz, National Health Mission Area chief engineer. “If we want to make fast and informed decisions, we need to utilize an interconnected and interdependent framework that brings these data silos together and delivers models, analytics and visualizations that are easy to interpret.”

The framework that Ravitz is outlining is the health-data ecosystem developed at APL alongside Lewis, Aaron Katz and the late John Piorkowski. The team hopes to establish an ecosystem that incorporates security and privacy requirements from the start and that can, according to the published framework, “continuously deliver timely, accurate and comprehensive data among interdependent entities spanning all levels of society.”

The Johns Hopkins Coronavirus Research Center’s COVID-19 dashboard is an example of a successful component of a health-data ecosystem. The dashboard, developed at APL, became the world’s most trusted, accurate source of information available on the pandemic. “We’ve taken what we’ve learned from the dashboard and are now applying it to make enduring capabilities that will benefit the country and the world,” said Lewis.

Increasing Data Collection Capabilities

In addition to emphasizing the need for data-driven decisions, the COVID-19 pandemic encouraged officials and organizations to expand genomic surveillance technology critical to collecting that data. A focus point outlined in the World Health Organization’s (WHO) 10-year strategy aims to scale up genomic surveillance capabilities around the world. Data collected by the WHO shows that in March 2021, 54% of countries had this capacity. By January 2022, that number increased to 68%. Though a significant bump up, there is still room to expand and stabilize the technology around the world.

“Genomic surveillance is most useful when we’re casting as wide of a net as possible,” explained Peter Thielen, a molecular biologist in APL’s Research and Exploratory Development Department and member of the APL team that has sequenced influenza and SARS-CoV-2. “What we’re doing in the United States captures local circulation, but studying what is happening in Africa or Southeast Asia can help us be proactive to anticipate what might be coming next.”

Take the flu for example. The exact timing and duration of flu season vary, but it typically circulates in fall and winter. “What happens in the Southern Hemisphere can predict what’s going to happen in the Northern Hemisphere when seasons change, so we’ll look to Australia, Southeast Asia and Africa to anticipate how severe a flu season might be,” said Thielen. “It really is a global community effort to be able to pull the data together and sequence things at scale.”

Scientists use sequencing information to design the newest seasonal vaccines. Historically, they’ve done it for the flu, but now they’re doing it for COVID vaccines such as the bivalent boosters.

“The flu vaccine is a traditional vaccine, and it takes roughly eight months to produce tens of millions of doses from the time we first begin to identify what genomic variants are spreading throughout the population,” Thielen explained. “So decisions on what makes up the vaccine are made so early in the flu season that predominant genetic variants may have shifted by the time the vaccine is ready to be distributed.”

“Our ability to generate that data as quickly as possible drives our ability to come up with new medical countermeasures. The faster we can generate, integrate and make sense of that data, the more it can be leveraged by the global community for decision-making,” he continued.

The trouble is that sequencing is technically, monetarily and technologically intensive.

“We’re trying to establish this very difficult capability in places that have extremely limited resources,” said Natalie Lee, a virologist in the Lab’s National Health Mission Area. “I’ve been to hospitals all around the world that don’t even have the ability to test for any virus infections, and in order to sequence something, you have to first diagnose that the sample is infected with something. There are a lot of steps to take to deliver that final piece of data, and many communities don’t have the foundation to do that.”

The Laboratory is improving part of that process by continuing the global expansion of Basestack, a modular, open-source software suite for complex informatics work. Sequencing applications that were previously difficult to set up and unwieldy to use and also required powerful hardware and high-speed internet can now be run locally, on off-the-shelf laptops, by way of a clean and intuitive interface. Basestack makes advanced genomics tools accessible and user-friendly for scientists and public health workers around the world and across the health data ecosystem.

“We’re trying to enable data production as early in the process as possible,” said Thielen.