## Abstract

Verification of string manipulation operations is a crucial problem in computer security. We present a new symbolic string verification technique that can be used to prove that vulnerabilities that result from improper string manipulation do not exist in a given program. We formally characterize the string verification problem as the reachability analysis of {\em string systems}, programs that contain only string variables and allow a limited set of operations on them. We show that string analysis problem is undecidable with even three variables if branch conditions that compare different variables are allowed. We develop a sound symbolic analysis technique for string verification that over-approximates the reachable states of the string system. We represent the set of string values that string variables can take using {em multi-track deterministic finite automata} and implement a forward fixpoint computation using an automata based widening operation. In order to handle branch conditions in string systems, we describe the precise construction of multi-track DFAs for linear word equations, such as $c_1X_1c_2= c_1'X_2c_2'$, as well as Boolean combinations of these equations. We show that non-linear word equations (even the simple one $X_1 =X_2X_3$) cannot be characterized precisely as a multi-track DFA. We propose a regular approximation for non-linear equations, such as $X_1ldots X_i = X_{1'}ldots X_{i'}$, as well as Boolean combinations of these equations. We present a summarization technique for inter-procedural analysis that generates a transducer characterizing the relationship between the input parameters and the return values of each procedure. We implemented these algorithms using the MONA automata package and analyzed several PHP programs. Unlike prior string analysis techniques, our approach is able to keep track of the relationships among the string variables, improving the precision of the string analysis and enabling verification of assertions such as $X_1=X_2$ where $X_1$ and $X_2$ are string variables.