A parser, the element in a computer system that converts data inputs into an understandable format, is the first line of defense for cybersecurity. A multi-institute group of researchers that includes Gang Tan, assistant professor of computer science and engineering in the School of Electrical Engineering and Computer Science at Penn State, has received an $8 million grant that allots $1 million for Penn State’s part of the research to increase computer security by developing more secure parsers.
The research project, “SPARTA: The Secure Parser Toolkit for Assurance,” is funded by the Defense Advanced Research Projects Agency (DARPA) and is a collaboration among Penn State and Galois Inc., Cornell University and Purdue University researchers.
The role of a parser in a computer system is to take outside data inputs and convert them into internal representations. Parsers are considered a critical security piece in many systems because they should be able to identify adversarial elements and warn a system user that the program in question may be taking malicious input. However, a cyber attacker could feed malformed data that would trigger bugs in the parser to take over the system. Tan and his research team aim to create parsers that have provable guarantees about safety and are not susceptible to the many bugs that parsers commonly have now.
“There are tools you can use to manually write those parsers, but, in the end, you don’t get many guarantees,” Tan said. “You just rely on the competence of the programmers, and often, these parsers are very complex. Programmers make mistakes, and as a result those mistakes cause vulnerability in computer systems.”
For example, at the time that Tan submitted his research proposal, over 1,000 parser bugs were reported for the popular suite of Mozilla products, impacting the security of many common file formats including PDF, ZIP, PNG and JPG.
Tan said that he hopes that with the creation of the SPARTA system, he will potentially be able to develop the most secure parsers to date with a novel parser language and rigorous formal methods.
The researchers are focusing specifically on a program called SafeDocs that is geared toward safely opening PDFs.
“PDF is a format with a lot of features, and some features are harder to handle than other features,” Tan said. “This parser would warn you if this PDF document is not obeying some safe subset of the format. If this parser agrees to open it, you’re guaranteed to be safe. There’s a provable way of saying it’s safe.”
While the project’s focus is on parsers for PDF security, the researchers hope their new system can be applied to other formats, including for videos and images.
“The topic of parsing has been there since the early days of programming, but people have been mostly focusing on functionality, saying, ‘I can build a parser that can parse this kind of data,’ but haven’t paid too much attention to correctness, that is, how do you convince the world that this parser is doing the right thing?” Tan said. “And that turns out to be quite important given the cybersecurity threat. I think what is most exciting about our research is it could give some provable guarantees.”