Control Flow Analysis
Subsequent instructions with no branches make up segments of linear code. The first instruction of the program is a leader, as is every target of a branch, and every instruction immediately following a branch (including conditional branches and procedure returns) is a leader. A basic block is the maximum segment of linear code associated with each leader -- it ends with either the program's last instruction or the first branch following a leader. A basic block a flows to b if and only if:
- either b immediately follows a, and a does not end in an unconditional branch,
- or a ends in a branch, of which b is a potential target.
Note that an indirect branch, without context information, trivializes all blocks (every instruction becomes a leader) and flows to them all from at least that point (an arborescence is formed)!
The directed multigraph defined by interpreting basic blocks as vertices, and flow relationships as edges, yields its control flow graph (CFG). A start node exists for each CFG, corresponding to the basic block whose header is the first instruction of the program.
The antisymmetric, transitive, reflexive domination relation is defined on vertices of a CFG (and thus basic blocks of the underlying program). A vertex a dominates b (a <= b) if every path from the start node s to b passes through a. A vertex a properly dominates b (a < b) if a dominates and is not equal to b. A vertex a directly/immediately dominates b (a <d b) if a properly dominates b, and a dominates no vertex c that dominates b. This relation induces the dominator tree, where nodes dominate all descendents in the tree. The start node s dominates all nodes, properly dominates all nodes but itself, and roots the dominator tree.
- It should be obvious that a's preceding of b in the CFG is not necessary for even immediate dominance of b by a.
Loops can be discovered via domination analysis (it is important to note that this refers to loops in the generated code, not loop constructs of the source language, and furthermore that all possible loops will be found (ie, unstructured loops constructed from C gotos)). Discover all strongly-connected subgraphs (SCCs) of the CFG (subgraphs where, for each vertex, a path (not necessarily an edge) exists from that vertex to all other nodes of the subgraph); if a subgraph contains a node dominating all that subgraph's nodes, the subgraph is a loop. The trivial case is, of course, a statement which jumps to itself, ala the BASIC program 10 GOTO 10. Implementation via Kosaraju's Algorithm is simple, with O(|V|+|E|) time complexity using graph encoding and O(N2) time complexity using adjacency matrices:
- Perform a recursive depth-first traversal of the graph starting from s. Each time you return, add that node onto an auxiliary vector. Upon the traversal's completion, this vector sorts the nodes topologically.
- Until the the vector is empty, use the last node of the vector to begin traversing the transpose graph. Remove the path from the vector; these paths partition the graph into SCCs.