OpenRCE Blog Entry

RolfRolles <rolf

rolles

gmail

com>

Tuesday, January 22 2008 16:20.30 CST

Here are some optimizations that I have seen MSVC apply to structure references.  I wish I could give you the real names for these optimizations, but I can't find them in any of my compilers textbooks.  I have a feeling that they're buried away somewhere inside of Randy Allen and Ken Kennedy's incredibly dense tome, "Optimizing Compilers for Modern Architectures".  If anybody knows the real names for these transformations, please speak up.

#1:  Let's say we are accessing multiple entries in a structure that's larger than 80h.  Now as stated in the previous entry, each access to the members situated at >= 0x80 is going to require a dword in the instruction encoding if we generate the "naive" code.  If we instead do:

lea esi, [esi+middle_of_structure_somewhere]

mov eax, [esi-(middle_of_structure_somewhere - member_offset1)]
mov ebx, [esi+(member_offset2 - middle_of_structure_somewhere)]

We can access more of the structure with the one-byte instruction encoding, if those subtracted quantities are bytes.  The compiler chooses middle_of_structure_somewhere specifically to maximize the number of one-byte references.  This is the same idea behind the "frame pointer delta" stack-frame optimization.

#2:  Let's say we have a loop that accesses two arrays of structures inside of another structure, one array beginning at +1234h, the other beginning at +2234h.  If we emit the "naive" code:

; ecx = loop induction variable
imul ebx, ecx, sizeof(structure1)
imul edx, ecx, sizeof(structure2)

mov eax, [esi+1234h+ebx+offset_of_member1]
mov edi, [esi+2234h+edx+offset_of_member2]

Then obviously both of these structure displacements are going to require a separate dword in the instruction encoding for 1234h+offset_of_member1 and 2234h+offset_of_member2.  If we instead do:

lea esi, [esi+1234h]

; ecx = loop induction variable
imul ebx, ecx, sizeof(structure1)
imul edx, ecx, sizeof(structure2)

mov eax, [esi+ebx+offset_of_member1]
mov edi, [esi+1000h+edx+offset_of_member2]

Then if offset_of_member1 is a byte, it's only going to require a byte in the instruction encoding, thus saving three bytes per reference to the first structure (we can combine the previous optimization to place esi such that the number of one-byte references is maximized).  Alternatively, if more members in the second structure are accessed than those in the first, we'll see:

lea esi, [esi+2234h]

; ecx = loop induction variable
imul ebx, ecx, sizeof(structure1)
imul edx, ecx, sizeof(structure2)

mov eax, [esi+ebx+offset_of_member1-1000h]
mov edi, [esi+edx+offset_of_member2]

Once again, the first optimization can also be applied here to choose the optimal placement for esi that maximizes the number of single-byte references.  The multiplications given in the second optimization can also be optimized away into additions.

c1de0x

Posted: Wednesday, January 23 2008 15:11.31 CST

Rolf: Now we're talking!

Just one thing... you say "one byte instruction" when I think you mean "one byte operand". Not a big deal but it can get confusing.

Another place you see negative 'structure' offsets, although not strictly speaking an optimization, is in classes with multiple inheritance. In this case, you'll sometimes have virtual functions receiving pointers to 'this' and using a negative offset to cast up the inheritance hierarchy.

Again, this kind of thing can probably be detected and 'noted' automatically.