Check out my first novel, midnight's simulacra!
Compiler Design: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
==Control Flow Analysis== | ==Control Flow Analysis== | ||
Subsequent instructions | Subsequent branch-free instructions make up segments of ''linear code''. The first instruction of the program is a ''leader'', as is every target of a branch, and every instruction immediately following a branch (including conditional branches and procedure returns) is a leader. A ''basic block'' is the maximum segment of linear code associated with each leader -- it ends with either the program's last instruction or the first branch following a leader. A basic block a ''flows'' to b if and only if: | ||
* either b immediately follows a, and a does not end in an unconditional branch, | * either b immediately follows a, and a does not end in an unconditional branch, | ||
* or a ends in a branch, of which b is a potential target. | * or a ends in a branch, of which b is a potential target. | ||
Line 10: | Line 10: | ||
* It should be obvious that a's preceding of b in the CFG is not necessary for even immediate dominance of b by a. | * It should be obvious that a's preceding of b in the CFG is not necessary for even immediate dominance of b by a. | ||
Dataflow analysis is most usefully performed into and out of ''regions'', subsets of the nodes such that a ''header'' exists which dominates all nodes in the region, and all edges between nodes in the region are themselves in the region. A loop is a region which is strongly connected, where all back-edges to the header are themselves within the region '''FIXME -- unclear'''. | |||
Loops can be discovered via domination analysis (it is important to note that this refers to loops in the generated code, not loop constructs of the source language, and furthermore that all possible loops will be found (ie, unstructured loops constructed from C <tt>goto</tt>s)). Discover all strongly-connected subgraphs (SCCs) of the CFG (subgraphs where, for each vertex, a path ('''not''' necessarily an edge) exists from that vertex to all other nodes of the subgraph); if a subgraph contains a node dominating all that subgraph's nodes, the subgraph is a loop. The trivial case is, of course, a statement which jumps to itself, ala the BASIC program <tt>10 GOTO 10</tt>. Implementation via [[Kosaraju's Algorithm]] is simple, with O(|V|+|E|) time complexity using graph encoding and O(N<sup>2</sup>) time complexity using adjacency matrices: | Loops can be discovered via domination analysis (it is important to note that this refers to loops in the generated code, not loop constructs of the source language, and furthermore that all possible loops will be found (ie, unstructured loops constructed from C <tt>goto</tt>s)). Discover all strongly-connected subgraphs (SCCs) of the CFG (subgraphs where, for each vertex, a path ('''not''' necessarily an edge) exists from that vertex to all other nodes of the subgraph); if a subgraph contains a node dominating all that subgraph's nodes, the subgraph is a loop. The trivial case is, of course, a statement which jumps to itself, ala the BASIC program <tt>10 GOTO 10</tt>. Implementation via [[Kosaraju's Algorithm]] is simple, with O(|V|+|E|) time complexity using graph encoding and O(N<sup>2</sup>) time complexity using adjacency matrices: |
Revision as of 13:45, 9 March 2009
Control Flow Analysis
Subsequent branch-free instructions make up segments of linear code. The first instruction of the program is a leader, as is every target of a branch, and every instruction immediately following a branch (including conditional branches and procedure returns) is a leader. A basic block is the maximum segment of linear code associated with each leader -- it ends with either the program's last instruction or the first branch following a leader. A basic block a flows to b if and only if:
- either b immediately follows a, and a does not end in an unconditional branch,
- or a ends in a branch, of which b is a potential target.
Note that an indirect branch, without context information, trivializes all blocks (every instruction becomes a leader) and flows to them all from at least that point (an arborescence is formed)!
The directed multigraph defined by interpreting basic blocks as vertices, and flow relationships as edges, yields its control flow graph (CFG). A start node exists for each CFG, corresponding to the basic block whose header is the first instruction of the program.
The antisymmetric, transitive, reflexive domination relation is defined on vertices of a CFG (and thus basic blocks of the underlying program). A vertex a dominates b (a <= b) if every path from the start node s to b passes through a. A vertex a properly dominates b (a < b) if a dominates and is not equal to b. A vertex a directly/immediately dominates b (a <d b) if a properly dominates b, and a dominates no vertex c that dominates b. This relation induces the dominator tree, where nodes dominate all descendents in the tree. The start node s dominates all nodes, properly dominates all nodes but itself, and roots the dominator tree.
- It should be obvious that a's preceding of b in the CFG is not necessary for even immediate dominance of b by a.
Dataflow analysis is most usefully performed into and out of regions, subsets of the nodes such that a header exists which dominates all nodes in the region, and all edges between nodes in the region are themselves in the region. A loop is a region which is strongly connected, where all back-edges to the header are themselves within the region FIXME -- unclear.
Loops can be discovered via domination analysis (it is important to note that this refers to loops in the generated code, not loop constructs of the source language, and furthermore that all possible loops will be found (ie, unstructured loops constructed from C gotos)). Discover all strongly-connected subgraphs (SCCs) of the CFG (subgraphs where, for each vertex, a path (not necessarily an edge) exists from that vertex to all other nodes of the subgraph); if a subgraph contains a node dominating all that subgraph's nodes, the subgraph is a loop. The trivial case is, of course, a statement which jumps to itself, ala the BASIC program 10 GOTO 10. Implementation via Kosaraju's Algorithm is simple, with O(|V|+|E|) time complexity using graph encoding and O(N2) time complexity using adjacency matrices:
- Perform a recursive depth-first traversal of the graph starting from s. Each time you return, add that node onto an auxiliary vector. Upon the traversal's completion, this vector sorts the nodes topologically.
- Until the the vector is empty, use the last node of the vector to begin traversing the transpose graph. Remove the path from the vector; these paths partition the graph into SCCs.
Kosaraju's algorithm is improved upon by Tarjan's Algorithm and Gabow's Algorithm. Natural loop identification proceeds via identification of back edges (edges from a node b to a node a, where a dominates b). A loop is associated with every such back edge; if a backedge exists from b to a, the associated loop is entered at a, and consists additionally of all nodes which can reach b without going through a. Similarly, a loop is associated with the target of every back edge, this being the union of all such backedges' associated natural loops.