Static Analysis of JavaScript


Static analysis, JavaScript, Secure information flow

JavaScript has been called the lingua franca of the Web; it underlies much of the functionality of the Web and of modern web browsers. Statically analyzing JavaScript has many useful applications including security, performance, and program refactoring. However, JavaScript has language features that make it fundamentally hard to statically analyze, including dynamic typing, computable field accesses, dynamic field addition and deletion, implicit casting, and prototype-based inheritance. It also has many surprising behaviors and controversial features, such as variable hoisting, lack of block scoping, the with construct, the arguments array, and odd binding rules for the this reference. Our goal is a sound and precise static analysis framework for JavaScript.

Our approach is to use abstract interpretation. We first define a core JavaScript calculus with a formal concrete semantics, called notJS (inspired by Guha et al's λJS), along with a translation from JavaScript to notJS. We then abstract the concrete notJS semantics into a directly-executable abstract interpreter which can carry out a sound and precise analysis of JavaScript programs. As a first test of our approach we have created a static secure information flow analysis for JavaScript-based browser addons and applied it to a suite of real addons from the Mozilla Firefox addon repository, with excellent results (this work under submission for publication). We are currently working on further applications of our static analysis framework, including type inference, program slicing, and refactoring tools.

Tunable Precision for Static Analysis


Static analysis

Designing a program analysis requires balancing precision and performance—an analysis must have sufficient precision to be useful, and at the same time have sufficient performance to be practical. An important dimension of this tradeoff is control-flow sensitivity: how precisely the analysis adheres to realizable program execution paths. The large body of existing work on dataflow analysis contains a number of approaches along this dimension, such as flow-insensitivity, flow-sensitivity, context-sensitivities such as k-CFA and object-sensitivity, and many more. However, these existing dataflow analysis approaches are generally specified at best semi-formally and leave open some important questions: What are the semantic foundations of these techniques? Can we characterize the entire space of possible control-flow sensitivities? Among these choices, how do we decide which choice is best for a particular analysis and domain of programs? Trial-and-error (or just plain guessing) is the common approach, and currently the only way to explore different choices is to implement each one from scratch. Is there a better way?

This project aims to answer these questions using the techniques of abstract interpretation. Our goal is not just a theoretical framework for control-flow sensitivity, but a practical technique for implementing analyses with tunable control-flow sensitivity (i.e., sensitivity that can easily be varied without modifying either the abstract analysis semantics or the analyzer implementation). While some existing sensitivities are readily tunable in a narrow range (e.g., varying the k in k-CFA), we wish to be able to tune an analysis among arbitrary sensitivities, including all of the ones mentioned above and even ones that haven’t been invented yet. Thus analysis designers can ignore issues of control-flow sensitivity when designing their analyses, and afterwards specify whatever level of sensitivity they desire (possibly after experimenting with a range of sensitivities to determine the optimal choice). We base our project on a novel insight: that control-flow sensitivities are widening operators on the powerset lattice of abstract states (a description of this work is under preparation for a conference submission). We have used this insight to realize our goal of tunable control-flow sensitivity, and demonstrate the utility of our framework by using it to create an abstract interpreter for JavaScript that has tunable precision (see the project Abstract Interpretation of JavaScript). We are currently exploring the new analysis possibilities that are opened up by having such a framework.

Enforcing Secure Information Flow


Secure information flow, Static analysis

Secure information flow guarantees the secrecy and integrity of data, prohibiting an attacker from learning secret information (secrecy) or injecting untrusted information (integrity). For example, a financial application might have access to a user's private financial data as well as the ability to send messages over the network to access public financial data, such as stock quotes. This level of access is necessary for the application to function---it cannot be denied either resource. However, the user would like some guarantee that a malicious application cannot use its network access to make the user's private information public.

A common guarantee that secure information flow can provide is called noninterference. In terms of secrecy, this guarantee states that the behavior of publically observable events (e.g., messages sent over a network) cannot be influenced by private data (e.g., a user's private information). A similar guarantee can be made about integrity, which is the dual of privacy. The task of secure information flow is to identify information channels---mechanisms that are able to transmit information---and prohibit leaks (information flows along those channels that violate noninterference). We are working on several projects that target secure information flow at various levels of abstraction, including web applications (see the project Abstract Interpretation of JavaScript) and hardware design (see the project Secure Hardware Design).

Engineering JavaScript Interpreters


JavaScript, Interpreters, Virtual machines

JavaScript is being used to build increasingly large and complex web applications; hence language performance has become an important issue. In collaboration with Qualcomm Research we are working on new techniques for engineering JavaScript interpreters. This project leverages our experience with JavaScript analysis in order to couple static techniques (such as type inference) with dynamic techniques (such as online profiling and JIT compilation) to significantly increase the performance of the JavaScript engine.

Secure Hardware Design


Hardware design, Secure information flow

Secure information flow is an important security property that must be incorporated from the ground up, including at hardware design time, in order to provide a formal basis for a system's root of trust. The goal of this project is to incorporate insights and techniques from designing secure programming languages to provide a new perspective on designing secure hardware. In cooperation with the UCSB Architecture Lab we have designed two new hardware description languages called Caisson and Sapper. These languages combine domain-specific abstractions common in hardware design with insights from type-based techniques used in secure programming languages. The proper combination of these elements allows, in comparison with previous work on secure hardware design, for an expressive, provably-secure HDL that operates at a familiar level of abstraction to the target audience of the language, hardware architects.

Caisson and Sapper are based on finite state machines, an abstraction that is very familiar to hardware designers. We take advantage of the features of FSMs, coupled with type-based techniques from programming languages, to enforce secure information flow at the hardware level. At the same time, hardware designers are able to work with familiar concepts and to receive immediate feedback on information leaks early in design time. As a proof-of-concept, we have used Caisson and Sapper to design several varieties of secure hardware processors, something that, while simple in Caisson, is difficult both to design and to verify using conventional techniques.

Pointer Analysis for C-Style Languages


Static analysis, Pointer analysis (no longer active)

This project focuses on fast, precise pointer analysis for C-style languages (i.e., imperative languages with first-order functions). Pointer analysis is a fundamental enabling technology for program analysis. By improving the scalability of precise pointer analysis we can make a positive impact across a wide range of program analyses used for many different purposes, including program verification and model checking, optimization and parallelization, program understanding, hardware synthesis, and more. This project presents a set of algorithms for inclusion-based pointer analysis that are over 4× faster while using 7× less memory than the previous state of the art. These algorithms have been incorporated in the GCC and LLVM compiler infrastructures as well as adopted by companies in industry, such as Semantic Designs. In addition, this project presents a set of algorithms for flow-sensitive pointer analysis that increase scalability by two orders of magnitude over the previous state of the art, enabling flow-sensitive pointer analysis for over one million lines of code.