Flag: Tornado! Hurricane!

Blogs >> trufae's Blog

Created: Thursday, April 24 2008 10:25.05 CDT Modified: Thursday, April 24 2008 10:51.14 CDT
Printer Friendly ...
Opcode execution cost
Author: trufae # Views: 3290

I have managed to write a small ugly and obfuscated hacky program to test the execution cost of different opcodes to create a small table containing the execution time cost for each opcode.

I run 20K times the opcode defined in the input file (one per line) and calculates the mid time of 100 runs.

Not all opcodes can be tested (think that branches, traps, interrupts and pipeline/caches cannot be tested in this way), but it's quite useful for writing optimizations.

I think that GCC has some of these cost execution tables for their own optimizations but I prefer to recreate them on the real hardware.

This program is obviously targeted to run on UNIX boxes with gnu the compiler. But should be effort-less to port it to windows.

Here's a sample session and the source code:


$ ./opcost opcodes.txt
     srl $t0, $t0, 8:    0 45
      lb $t0, 8($sp):    0 25
      lw $t1, 0($sp):    0 25
     lw $t1, 16($sp):    0 31
    sll $t0, $t1, 16:    0 18
       move $v0, $v1:    0 18
          li $v0, 33:    0 18
   add $v0, $v0, $v0:    0 45
       mult $v0, $v0:    0 45
                 nop:    0 18



$ cat opcost.c
/**
* Opcode cost payload calculation
*
* pancake <pancake/A/youterm/D/com>
*/

#include <stdio.h>
#include <string.h>

#define TIMES "20560"
#define LAUNCHER ""

char *prg = "echo '#include <sys/time.h>\n#include <time.h>\n#include <stdio.h>\n"
"int main() { int i, a=0,b=0; struct timeval tv,tv2;"
"  for(i=0;i<100;i++) {  gettimeofday(&tv, NULL);"
"  asm(\"\\n.rept "TIMES"\\n%s\\n.endr\\n\"); gettimeofday(&tv2, NULL);"
"  if (i==0) continue; /* skip first iteration */"
"  /* printf(\"%%d %%d\\n\", tv2.tv_sec-tv.tv_sec, tv2.tv_usec - tv.tv_usec); */"
"  if (a == 0) a = tv2.tv_sec-tv.tv_sec; if (b == 0) b = tv2.tv_usec-tv.tv_usec;"
"  else { a=(a+tv2.tv_sec-tv.tv_sec)>>1; b=(b+tv2.tv_usec-tv.tv_usec)>>1; }"
"  } printf(\"%20s:\t %%d %%d\\n\", a, b); return 0;}' > .tst.c && gcc %s .tst.c -o .a.out"
"&& "LAUNCHER" ./.a.out && rm -f .a.out .tst.c";

int main(int argc, char **argv)
{
        FILE *fd;
        char buf[1024];
        char line[1024];
        const char *cflags;
        if (argc<2) {
                printf("Usage: opcost [file]\n");
                return 1;
        }
        fd = fopen(argv[1], "r");
        if (fd == NULL) {
                printf("Cannot open '%s'\n", argv[1]);
                return 1;
        }
        cflags = (const char *)getenv("CFLAGS");
        if (!cflags) cflags="";
        while(1) {
                line[0] = '\0';
                fgets(line, 1024, fd);
                if (!*line||line[1]=='\0') break;
                line[strlen(line)-1] = '\0';
                sprintf(buf, prg, line, line, cflags);
                if (system(buf))
                        break;
        }
        fclose(fd);
}


Have fun!


Blog Comments
tagetora Posted: Thursday, April 24 2008 10:49.27 CDT
Why are you doing it in C and not in assembler? I mean, using RDTSC <opcode> RDTSC or an equivalent timestamping function if you are not on x86.



trufae Posted: Thursday, April 24 2008 10:56.23 CDT
because i want something portable and i don't want to spend my -precious- time rewriting the same 'shit' of test for every one.

This is mips and there is no rdtsc like opcode for it.

Writing it for a specific architecture/os it will get better performance. I know that embedding C and shellscript inside a C program is not that clean.

BTW i'm not interested on the "real" time of execution because this is impossible to achieve. I just want a proportion time of execution between all the analyzed opcodes (a ratio) which can be calculated parsing the output.

btw nice to read you ! :)

tagetora Posted: Thursday, April 24 2008 11:11.56 CDT
Okay, no rdtsc, but what about using clock() instead of gettimeofday()? It seems more reliable to me and, oh! it's portable too ;D

trufae Posted: Thursday, April 24 2008 11:39.32 CDT
Uhm, gettimeofday() is a syscall, no idea how is implemented clock() but I prefer to directly call the kernel :)

RolfRolles Posted: Thursday, April 24 2008 16:30.27 CDT
The problem with calculating an instruction's "execution cost" is that, due to pipelining (and other modern processor features), the speed of execution of an instruction is not only dependent upon the instruction itself, but also on the instructions executed beforehand.  Hence, the term "instruction execution cost" is meaningless.  This is why the Intel manuals no longer list cycle-level timing counts, and this is also why compilers beat hand-written assembly code in many cases (optimization is no longer as simple as "use a faster instruction").  Though you are talking about MIPS and not x86, the concept of pipelining applies to both RISC and CISC processors.

This is a complicated subject, but this article does a good job in explaining it.

MazeGen Posted: Friday, April 25 2008 01:55.45 CDT
This is why the Intel manuals no longer list cycle-level timing counts

Intel lists the number of clock cycles in Optimization Reference Manual, Appendix C

trufae Posted: Friday, April 25 2008 04:00.15 CDT
[i]Though you are talking about MIPS and not x86, the concept of pipelining applies to both RISC and CISC processors.[i]

The pipeline usage in mips and x86 is completely different. And the way for optimizing code for it is cpu specific, (not all mips have the same pipeline rules).

I know that cost of execution per opcode is not a always-valid rule for choosing one opcode or another, but using a bottom-top point of view, you can think in per instruction cost - per context cost and per block cost.

The instruction cost will be different by shuffling instructions, because of the payload in dependencies on the next opcode. On mips for example your jumps must be aligned to %32+24 to avoid refilling the pipeline in an sub-optimal way (for example).

So, these costs have nothing to do with the reality (unless your program is composed by a single instruction :P).

Will be interesting (and not very hard) to code a program to check per instruction, per loop and per block execution costs of an algorithm. This way it will be possible to see the pipeline and instruction payloads in a context-sense environment. (just for analysis and fun)

RolfRolles Posted: Tuesday, April 29 2008 03:26.45 CDT
Sorry for the slow response -- GTA IV, enough said.  I stand corrected on the point about cycle counts being included in the manual, and thank you for pointing this out.  Nonetheless my overall point is borne out by the footnotes in appendix C, which indicate that the cycle counts are for the worst-case scenario, due to pipelining and OOO, and are "not meant to be used ... for instruction-level performance benchmarks".  By the way, MazeGen, thanks for geek32 -- I have found it handy for disassembling disassemblers.

MazeGen Posted: Wednesday, May 7 2008 08:26.03 CDT
thanks for geek32 -- I have found it handy for disassembling disassemblers.

Good to hear it is useful for you. Just one question, can you actually read geek's cryptic operand codes, or you need merely the mnemonics? I mean, coder32 is more useful in some cases.



Add New Comment
Comment:









Active in Last 5 Minutes
Lizzard

There are 28,229 total registered users.


Recently Created Topics
Reverse Engineering ...
Jan/23
Career: DoD Agency I...
Jan/22
"Disappearing&q...
Jan/17
Career: Software Sec...
Jan/11
Where is the call st...
Jan/07
IDA Pro 6.1 Breakpoi...
Jan/01
How to create data s...
Dec/30
can i search all mod...
Dec/23
IDA symbol table exp...
Dec/20
An anti-attach trick
Dec/17


Recent Forum Posts
Reverse Engineering ...
NirIzr
"Disappearing&q...
NirIzr
Reverse Engineering ...
charlie
"Disappearing&q...
charlie
An anti-attach trick
Bass
An anti-attach trick
waleeda...
An anti-attach trick
Bass
An anti-attach trick
waleeda...
An anti-attach trick
Bass
Looking for value in...
NirIzr


Recent Blog Entries
cmathieu
Feb/07
Hacker Carnival

waleedassar
Feb/06
OllyDbg v1.10 And Hardware ...

waleedassar
Jan/31
Yet Another Anti-Debug Trick

RolfRolles
Jan/22
Finding Bugs in VMs with a ...

waleedassar
Jan/13
An OllyDbg Bug Disables Sof...

More ...


Recent Blog Comments
waleedassar on:
Feb/07
OllyDbg v1.10 And Hardware ...

NirIzr on:
Feb/07
OllyDbg v1.10 And Hardware ...

NirIzr on:
Feb/05
Yet Another Anti-Debug Trick

trolotou on:
Feb/05
Doudoune Moncler -Pennies F...

waleedassar on:
Feb/01
Yet Another Anti-Debug Trick

More ...


Imagery
SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit