Title: Programming with Millions of Examples
Type: 40min talk
The vast amount of code available on the web is increasing on a daily basis. Open-source hosting sites such as GitHub contain billions of lines of code. Community question-answering sites provide millions of code snippets with corresponding text and metadata. The amount of code available in executable binaries is even greater. In this talk, I will cover recent research trends on leveraging such "big code" for program analysis, program synthesis and reverse engineering. Along the way, we will consider a range of semantic representations based on symbolic automata, tracelets and numerical abstractions as well as different notions of code similarity based on these representations. Finally, I will show applications of these techniques including semantic code search in both source code and stripped binaries, code completion and reverse engineering.