The Center for Applied Internet Data Analysis, or CAIDA, at the San Diego Supercomputer Center (SDSC) at UC San Diego, the University of Oregon’s Network Startup Resource Center (NSRC) and MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have been awarded more than $11 million by the National Science Foundation (NSF) for two projects aimed at improving internet infrastructure security. CAIDA Director Kimberly “KC” Claffy will lead the projects with a team of U.S. and international collaborators.
The $7.8 million for the first project will support the design and prototyping of a distributed but integrated infrastructure to measure internet topology and traffic dynamics, with the intention of improving internet infrastructure security. Working with Claffy toward this goal are three other principal investigators: David Clark (MIT/CSAIL), Bradley Huffaker (UC San Diego/CAIDA) and Hervey Allen (NSRC/University of Oregon), along with a team of more than 30 collaborators including academic and other researchers, industry experts and legal experts in data sharing challenges.
“Modern scientific research requires collaborative digital environments that are both accessible and secure,” said UC San Diego Chancellor Pradeep K. Khosla. “Consistent with NSF’s Blueprint for a National Cyberinfrastructure Ecosystem, UC San Diego views infrastructure holistically and aims to integrate a range of resources to enable greater use of internet infrastructure for transformative discoveries. These two projects will significantly forward our integration efforts.”
Claffy said that among other areas CAIDA and partners will be investigating sustainable production-data acquisition, curation tools, meta-data generation and efficient storage and dissemination to identify security gaps and potential efficiencies.
Baked-in security challenges to address
A leader and pioneer in internet science for nearly three decades, Claffy noted that there have always been security problems on the internet – some baked into the internet architecture. Many technical solutions to these problems have failed to gain traction in the marketplace for fundamentally economic reasons: their complexity and cost, the requirement for pervasive deployment to be effective, combined with lack of competitive advantage to be an early adopter.
“In light of the global coordination challenge of securing internet infrastructure, we believe that a central component of any solution will be compliance with security hygiene practices, and measurements for independent validation of such compliance,” said Claffy.
Claffy also explained that while users and researchers try to “keep an eye” on the internet, there are questions around who maintains the watchful eye, who pays for it, who governs it, and who gets access to internet data. Further, how is the data maintained, stored, curated and used?
“All of this is part of stewardship of what is now critical infrastructure, but it is the only critical infrastructure that has no government oversight, and no agency dedicated to measurement in the public interest,” said Claffy. “This project, titled Designing a Global Measurement Infrastructure to Improve Internet Security (GMI3S), serves as a portal to those conversations. It also is an opportunity to transition CAIDA’s work over the last 25 years to serve a growing global research community that can help define, prototype and evaluate instrumentation designed to not only collect data to address grand security challenges, but also to support management, curation and privacy-respecting disclosure of such data.”
Measurement a means to understanding
According to Clark, the security of the internet is a high priority for the security research community, but that community is greatly hindered by a lack of relevant data.
“Researchers, governments and advocates for society need a more rigorous understanding of the internet ecosystem, a need made more urgent by the rising influence of adversarial actors,” he said. “We cannot secure what we do not understand, and we cannot understand what we do not measure.”
The GMI3S project, however, is not aimed at tackling all internet security problems. “The internet has a layered structure, which, in its simplest form, is a data transport layer on top of which runs a wide range of applications. Our focus is on the internet as a data transport service, and vulnerabilities specific to that layer, such as attacks on internet routing that deflect traffic to bogus destinations, abuses of the domain name system; attacks on the key management system that underpin identity and authentication on the internet; and spoofing of internet addresses to disrupt regions of the internet with untraceable traffic,” explained Huffaker.
Allen added that security challenges at these layers seem to get less publicity than attacks on endpoints (malware, ransomware, etc.), or design features in applications that lead to risky user experiences. “But the challenges at the data transport layer are foundational – they affect the reliable operation of every application that operates over the internet,” he said.
The researchers identified the immediate target for this infrastructure as the research community that measures the internet and tries to improve its security. The intermediate beneficiary will be the operator community and the service providers of the internet while the ultimate beneficiary will be all of society.
“We recognize that better data alone will not improve the security of the internet – this proposal is part of a larger community agenda of research and outreach to industry and policymakers,” said Claffy. “The proposed infrastructure and its community of users will enable wide engagement of academic groups as well as private-sector security researchers in developing innovative, efficient and robust capabilities to tackle the challenges of known and emerging internet vulnerabilities.”
Claffy also noted that the internet is a “designed artifact” with an operational character that might seem to be understandable from analysis of its specifications.
“This is not so,” she said. “The internet is composed of tens of thousands of independent networks, and the overall behavior of the internet is determined by the independent decisions of the operators of those networks. Moreover, in most of the world, the internet infrastructure is the product of the private sector. Economic considerations that drive the private sector shape the character of the internet – key aspects of its resilience, security, privacy and its overall future trajectory. The only way to understand the behavior of the internet is to measure it.”
Dealing with a lack of good data
The goal of the second project, Integrated Library for Advancing Network Data Science (ILANDS), which received $3.5 million in funding over five years, is to understand the internet’s changing character through realistic datasets and longitudinal measurements, as well as new experiments with accessible data for researchers.
“There is a dearth of good data to support research for several good reasons: complexity, scale and cost of measurement instrumentation; information-hiding properties of the routing system, security and commercial sensitivities; costs of storing and processing the data; and lack of incentives to gather data in the first place,” said Claffy. “This lack of data hinders our ability to understand and reason about real-world properties of the internet such as robustness, resilience, security and stability.”
The approach of the project is to integrate the research community into the process from the beginning, to align its research goals and optimize NSF’s investment toward achievement of these goals. The approach will have five objectives: (1) shape what data we collect and store; (2) find new users of the infrastructure, especially from underrepresented groups; (3) bring the focused research collaborators together; (4) publish research results and analysis methods; and (5) establish a sustainability plan.
“This project will provide the internet science community more insight into aspects of internet resilience that we have not been able to study and certainly not over longitudinal data sets, which are hard to collect,” said Claffy.