|
Reverse engineering a compiler-produced artifact
In our training, Pedram and I deal with some very simple compiler optimizations or artifacts. Although they represent the same semantics that the programmer defined in the original source code, those optimizations are sometimes cumbersome to convert back to a meaningful high-level representation. The other day I was just studying a piece of code and bumped into a relatively common pattern. The code I was looking at was supposed to represent a division of a functions argument by a constant. But in the disassembled code I was studying I could only see a multiplication. This can be slightly confusing unless one has seen a bit more assembly than what is healthy and remembers some of the compiler-produced fun that goes on... A couple of things to remember:
The code looked like this: (irrelevant interleaved code left out)
What do we have so far? [ (Value * 0x66666667) / 2^32 ] / 2^3 ] But, whats that 0x66666667? why to multiply by something so large and then divide? The reason is that such computation allows the processor to keep most of the precision of the division it is trying to perform, still obtaining an integer in the end but without having to resort to using floating point arithmetic (which is far slower) Lets do an example in base 10. Imagine that you only can multiply and divide by 10 (shifting numbers left and right) and we want to divide a number by 30. By shifting we can only divide by 10, 100, 1000, etc But we have that: Value/30 = value * 1/3 * 1/10 Given that, represented as an integer, 1/3 would produce 0 we can "scale" it by multiplying by a large constant that later, once we are done, we divide by to get the value were after. Given that the easiest for us is to multiply/divide by 10, we can "scale" 1/3 and make it 100000/3 which approximately equals 33333, which is a nice integer value. We would want to make this value as large as it fits in our registers in order to be as precise as possible. The bigger it is the more precision it will retain for subsequent operations. Value/30 = ( Value*33333 ) / 1000000 Hence, we now have a clue now of where that 0x66666667 value might be coming from. Given that the processor works in base 2. We can assume that its going to prefer multiples of 2. Also, given that it will try to obtain the largest value that fits in a 32bit register, that gives us an idea of the range of the power-of-two in use. We can get there with a bit of trial and error (We want to obtain an integer as a result of dividing a power of two by 0x66666667). 2.0^33/0x66666667 = 4.9999999982537702 ~= 5 Therefore: 0x66666667 ~= 2^33/5 So, in the end we get to ( [ (Value * 2^33)/5] /2^32 ) / 2^3 And with some algebra it simplifies to: Value / (5*2^2) = Value/20 Effectively dividing the value by 20, without actually using the division instruction. Thats to the extent that compilers will go to avoid using the division instruction... Reverse engineering is fun isnt it? Update: Given that this is a relatively old and well known optimization strategy its only natural that it had been discussed before. It was just brought to my attention that Ilfak had blogged about a similar optimization and this chapter (PDF) from Hackers Delight provides more details. |