**Instruction selection** www.ntnu.edu TDT4205 – Lecture 30 #### Where we are - We have a fairly low-level view of the program, but - It features a memory model of infinite temporary variables - It isn't specific in terms of operations provided by the architecture - These will be our last two topics - Selecting machine-specific operations - Mapping variables to memory locations # Low-IR vs. machinery The instructions of low-level IR are not the same as the target machine # Straightforward solution Map every low-level IR to a fixed sequence of assembly instructions $$x = y + z \rightarrow$$ move y,r1 move z,r2 add r1,r2 move r2, x - Disadvantages: - Lots of redundant operations - More memory traffic than necessary # There may be several alternatives Translate a[i+1] = b[j] using these operations ``` add r2,r1 \leftarrow r1 = r1 + r2 mul c, r1 \leftarrow r1 = r1 + r2 \leftarrow r1 = r1 * c load r2, r1 \leftarrow r1 = *r2 store r2, r1 \leftarrow *r1 = r2 movem r2, r1 \leftarrow *r1 = *r2 movex r3, r2, r1 \leftarrow *r1 = *(r2+r3) ``` # The general steps Let's say that everything is 8-byte elements, and - Register r<sub>a</sub> holds &a - Register r<sub>b</sub> holds &b - Register r, holds i - Register r<sub>i</sub> holds j ``` a[i+1] = b[j] needs to ``` - Find address of b[j] - Load b[j] - Find address of a[i+1] - Store into a[i+1] ``` Address of b[j] mulc 8,r<sub>j</sub> add r<sub>j</sub>, r<sub>b</sub> Load b[j] load r<sub>b</sub>, r1 Address of a[i+1] add 1, r<sub>i</sub> mulc 8, r<sub>i</sub> add r<sub>i</sub>, r<sub>a</sub> ``` **TAC** Store into a[i+1] store r1, r<sub>a</sub> ### Another translation Address of b[j] ``` mulc 8, r_j add r_j, r_b ``` Address of a[i+1] ``` add 1, r_i mulc 8, r_i add r_i, r_a ``` Store into a[i+1] #### **TAC** #### One more translation Address of b[j] ``` mulc 8,r<sub>j</sub> ◀ ``` Address of a[i+1] ``` add 1, r<sub>i</sub> mulc 8, r<sub>i</sub> add r<sub>i</sub>, r<sub>a</sub> ``` Store into a[i+1] movex $$r_j$$ , $r_b$ , $r_a$ #### **TAC** # Why care? - Not all instructions are created equal - Some complete in a clock cycle - Others decompose into a sequence of steps, and take many - If we have a choice of translations, we'd like the one with the smallest sum of costs # Partial instructions aren't necessarily adjacent Address of b[j] mulc 8,r<sub>i</sub> Address of a[i+1] add 1, $r_i$ mulc 8, $r_i$ add $r_i$ , $r_a$ • Store into a[i+1] $movex r_{j}, r_{b}, r_{a}$ #### TAC # Tree representation The 4 overall steps can be written as a tree ### Instructions can be tiles (Subtrees of a particular pattern) ## Instructions can be tiles (Subtrees of a particular pattern) # Tiling An instruction selection covers the tree with disjoint tiles # Tiling An instruction selection covers the tree with disjoint tiles # Tilings for comparison Alternate tilings give different costs #### Better than trees - If we let common sub-expressions be represented by the same node, the trees become directed acyclic graphs (DAGs) - Separate labels and annotations - Label nodes with variales, constants or operators - Annotate nodes with variables that hold their value - Construct DAG from low-level IR # Basic procedure For each instruction in a basic block ``` if it's "x = y op z" find or create a node annotated y find or create a node annotated z find or create a node labeled op with operands y and z remove annotation x from everywhere add annotation x to the op node if it's "x = y" find or create a node annotated y add annotation x to it ``` ``` t = y + 1 w = y + 1 y = z *t t = t + 1 z = t * y w = z ``` ``` t = y + 1 w = y + 1 y = z * t t = t + 1 z = t * y w = z ``` $$t = y + 1$$ $w = y + 1$ $y = z *t$ $t = t + 1$ $z = t * y$ $w = z$ $$t = y + 1$$ $w = y + 1$ $y = z * t$ $t = t + 1$ $z = t * y$ $w = z$ $$t = y + 1$$ $w = y + 1$ $y = z * t$ $t = t + 1$ $z = t * y$ $w = z$ $$t = y + 1$$ $w = y + 1$ $y = z * t$ $t = t + 1$ $z = t * y$ $w = z$