Constraint Based Analysis

Constraint based analysis is a dominant approach to program analysis and is declarative in nature. Constraint based analysis is concerned with expressing what the analysis computes, not how it computes it. Constraint based analysis is concerned with the specification of the analysis rather than the implementation of the analysis. This type of analysis has the following advantages:

  • Simplification of design and understanding of analysis
  • Allows for rapid prototyping
  • Continuous performance and improvement

Motivation

Designing an efficient program analysis is a challenging task that requires the analyst to define the specification and implementation of the program analysis, e.g.:

  • Specification - "what"; no null pointer is dereferenced along any path in the program
  • Implementation - "how"; many design choices exist here:
    • forward vs backward traversal of the program
    • symbolic vs explicit representation of the program state
    • etc.

What is constraint based analysis?

In constraint based analysis, an analyst will define the constrains using a constraint language - the implementation of the program analysis will be automated by the constraint solver.

Benefits of constraint based analysis

The following are some benefits of constraint based analysis:

  • Separates analysis specification from implementation
    • Analysis writer can focus on what rather than how
  • Yields natural program specifications
    • Constraints are usually local, whose conjunctions capture global properties
  • Enables sophisticated analysis implementations
    • Leverage powerful, off-the-shelf solvers for implementation

What is Datalog?

Datalog is:

  • A declarative logic programming language
  • Not Turing-complete: subset of Prolog, or SQL with recursion
    • Efficient algorithms exist to evaluate Datalog programs

Datalog:

  • Originated as a query language for deductive databases
  • Later applied in many other domains:
    • software analysis
    • data mining
    • networking
    • security
    • knowledge representation
    • cloud-computing
    • etc
  • Many implementations are based on Datalog, e.g.:
    • Logicblox
    • bddbddb
    • IRIS
    • Paddle

Cloning based inter-procedural analysis

To enable context-sensitivity for program analysis using the methods described in this lecture, one can clone variables on a points-to graph analysis generated by pointer analysis. This prevents us from losing context, conducting inter-procedural analysis.

Through this technique, we achieve context sensitivity because we are inlining procedure calls. This increases our precision, but decreases our scalability.