10/30/2017 Colin Robertson, CS @ ILLINOIS
Written by Colin Robertson, CS @ ILLINOIS
As part of the CS @ ILLINOIS Distinguished Lecture Series, Dr. Andreas Zeller will present his research on security testing. The lecture will take place at 4 pm on October 30, in 2405 Siebel Center.
Mining Input Grammars for Massive Security Testing
Knowing which part of a program processes which parts of an input can reveal the structure of the input as well as the structure of the program. In a URL "http://www.example.com/path/", for instance, the protocol “http", the host “www.example.com", and the path “path" would be handled by different functions and stored in different variables. Given a set of sample inputs, we use dynamic tainting to trace the data flow of each input character, and aggregate those input fragments that would be handled by the same function into lexical and syntactical entities. The result is a context-free grammar that accurately reflects valid input structure; as it draws on function and variable names, it can be as readable as textbook examples:
URL ::= PROTOCOL "://" HOST "/" PATH
PROTOCOL ::= “http” | “https” | …
HOST ::= /[a-zA-Z0-9.]+/
...
We expect inferred grammars to considerably ease the understanding of file and input formats. Their most important use, however, will be in automatic fuzz testing, where grammars can easily be turned into producers that help to quickly cover program features. Our grammar-based LANGFUZZ fuzzer is in daily use at Mozilla and has uncovered more than 4,000 defects so far; mining grammars automatically will bring such techniques to a wide range of programs. For details on our work on grammar mining, see https://www.st.cs.uni-saarland.de/models/autogram/