Flag: Tornado! Hurricane!

Blogs >> trufae's Blog

Created: Thursday, April 24 2008 10:25.05 CDT Modified: Thursday, April 24 2008 10:51.14 CDT
Printer Friendly ...
Opcode execution cost
Author: trufae # Views: 4618

I have managed to write a small ugly and obfuscated hacky program to test the execution cost of different opcodes to create a small table containing the execution time cost for each opcode.

I run 20K times the opcode defined in the input file (one per line) and calculates the mid time of 100 runs.

Not all opcodes can be tested (think that branches, traps, interrupts and pipeline/caches cannot be tested in this way), but it's quite useful for writing optimizations.

I think that GCC has some of these cost execution tables for their own optimizations but I prefer to recreate them on the real hardware.

This program is obviously targeted to run on UNIX boxes with gnu the compiler. But should be effort-less to port it to windows.

Here's a sample session and the source code:


$ ./opcost opcodes.txt
     srl $t0, $t0, 8:    0 45
      lb $t0, 8($sp):    0 25
      lw $t1, 0($sp):    0 25
     lw $t1, 16($sp):    0 31
    sll $t0, $t1, 16:    0 18
       move $v0, $v1:    0 18
          li $v0, 33:    0 18
   add $v0, $v0, $v0:    0 45
       mult $v0, $v0:    0 45
                 nop:    0 18



$ cat opcost.c
/**
* Opcode cost payload calculation
*
* pancake <pancake/A/youterm/D/com>
*/

#include <stdio.h>
#include <string.h>

#define TIMES "20560"
#define LAUNCHER ""

char *prg = "echo '#include <sys/time.h>\n#include <time.h>\n#include <stdio.h>\n"
"int main() { int i, a=0,b=0; struct timeval tv,tv2;"
"  for(i=0;i<100;i++) {  gettimeofday(&tv, NULL);"
"  asm(\"\\n.rept "TIMES"\\n%s\\n.endr\\n\"); gettimeofday(&tv2, NULL);"
"  if (i==0) continue; /* skip first iteration */"
"  /* printf(\"%%d %%d\\n\", tv2.tv_sec-tv.tv_sec, tv2.tv_usec - tv.tv_usec); */"
"  if (a == 0) a = tv2.tv_sec-tv.tv_sec; if (b == 0) b = tv2.tv_usec-tv.tv_usec;"
"  else { a=(a+tv2.tv_sec-tv.tv_sec)>>1; b=(b+tv2.tv_usec-tv.tv_usec)>>1; }"
"  } printf(\"%20s:\t %%d %%d\\n\", a, b); return 0;}' > .tst.c && gcc %s .tst.c -o .a.out"
"&& "LAUNCHER" ./.a.out && rm -f .a.out .tst.c";

int main(int argc, char **argv)
{
        FILE *fd;
        char buf[1024];
        char line[1024];
        const char *cflags;
        if (argc<2) {
                printf("Usage: opcost [file]\n");
                return 1;
        }
        fd = fopen(argv[1], "r");
        if (fd == NULL) {
                printf("Cannot open '%s'\n", argv[1]);
                return 1;
        }
        cflags = (const char *)getenv("CFLAGS");
        if (!cflags) cflags="";
        while(1) {
                line[0] = '\0';
                fgets(line, 1024, fd);
                if (!*line||line[1]=='\0') break;
                line[strlen(line)-1] = '\0';
                sprintf(buf, prg, line, line, cflags);
                if (system(buf))
                        break;
        }
        fclose(fd);
}


Have fun!


Blog Comments
tagetora Posted: Thursday, April 24 2008 10:49.27 CDT
Why are you doing it in C and not in assembler? I mean, using RDTSC <opcode> RDTSC or an equivalent timestamping function if you are not on x86.



trufae Posted: Thursday, April 24 2008 10:56.23 CDT
because i want something portable and i don't want to spend my -precious- time rewriting the same 'shit' of test for every one.

This is mips and there is no rdtsc like opcode for it.

Writing it for a specific architecture/os it will get better performance. I know that embedding C and shellscript inside a C program is not that clean.

BTW i'm not interested on the "real" time of execution because this is impossible to achieve. I just want a proportion time of execution between all the analyzed opcodes (a ratio) which can be calculated parsing the output.

btw nice to read you ! :)

tagetora Posted: Thursday, April 24 2008 11:11.56 CDT
Okay, no rdtsc, but what about using clock() instead of gettimeofday()? It seems more reliable to me and, oh! it's portable too ;D

trufae Posted: Thursday, April 24 2008 11:39.32 CDT
Uhm, gettimeofday() is a syscall, no idea how is implemented clock() but I prefer to directly call the kernel :)

RolfRolles Posted: Thursday, April 24 2008 16:30.27 CDT
The problem with calculating an instruction's "execution cost" is that, due to pipelining (and other modern processor features), the speed of execution of an instruction is not only dependent upon the instruction itself, but also on the instructions executed beforehand.  Hence, the term "instruction execution cost" is meaningless.  This is why the Intel manuals no longer list cycle-level timing counts, and this is also why compilers beat hand-written assembly code in many cases (optimization is no longer as simple as "use a faster instruction").  Though you are talking about MIPS and not x86, the concept of pipelining applies to both RISC and CISC processors.

This is a complicated subject, but this article does a good job in explaining it.

MazeGen Posted: Friday, April 25 2008 01:55.45 CDT
This is why the Intel manuals no longer list cycle-level timing counts

Intel lists the number of clock cycles in Optimization Reference Manual, Appendix C

trufae Posted: Friday, April 25 2008 04:00.15 CDT
[i]Though you are talking about MIPS and not x86, the concept of pipelining applies to both RISC and CISC processors.[i]

The pipeline usage in mips and x86 is completely different. And the way for optimizing code for it is cpu specific, (not all mips have the same pipeline rules).

I know that cost of execution per opcode is not a always-valid rule for choosing one opcode or another, but using a bottom-top point of view, you can think in per instruction cost - per context cost and per block cost.

The instruction cost will be different by shuffling instructions, because of the payload in dependencies on the next opcode. On mips for example your jumps must be aligned to %32+24 to avoid refilling the pipeline in an sub-optimal way (for example).

So, these costs have nothing to do with the reality (unless your program is composed by a single instruction :P).

Will be interesting (and not very hard) to code a program to check per instruction, per loop and per block execution costs of an algorithm. This way it will be possible to see the pipeline and instruction payloads in a context-sense environment. (just for analysis and fun)

RolfRolles Posted: Tuesday, April 29 2008 03:26.45 CDT
Sorry for the slow response -- GTA IV, enough said.  I stand corrected on the point about cycle counts being included in the manual, and thank you for pointing this out.  Nonetheless my overall point is borne out by the footnotes in appendix C, which indicate that the cycle counts are for the worst-case scenario, due to pipelining and OOO, and are "not meant to be used ... for instruction-level performance benchmarks".  By the way, MazeGen, thanks for geek32 -- I have found it handy for disassembling disassemblers.

MazeGen Posted: Wednesday, May 7 2008 08:26.03 CDT
thanks for geek32 -- I have found it handy for disassembling disassemblers.

Good to hear it is useful for you. Just one question, can you actually read geek's cryptic operand codes, or you need merely the mnemonics? I mean, coder32 is more useful in some cases.



Add New Comment
Comment:









There are 31,313 total registered users.


Recently Created Topics
[help] Unpacking VMP...
Mar/12
Reverse Engineering ...
Jul/06
hi!
Jul/01
let 'IDAPython' impo...
Sep/24
set 'IDAPython' as t...
Sep/24
GuessType return une...
Sep/20
About retrieving the...
Sep/07
How to find specific...
Aug/15
How to get data depe...
Jul/07
Identify RVA data in...
May/06


Recent Forum Posts
Finding the procedur...
rolEYder
Question about debbu...
rolEYder
Identify RVA data in...
sohlow
let 'IDAPython' impo...
sohlow
How to find specific...
hackgreti
Problem with ollydbg
sh3dow
How can I write olly...
sh3dow
New LoadMAP plugin v...
mefisto...
Intel pin in loaded ...
djnemo
OOP_RE tool available?
Bl4ckm4n


Recent Blog Entries
halsten
Mar/14
Breaking IonCUBE VM

oleavr
Oct/24
Anatomy of a code tracer

hasherezade
Sep/24
IAT Patcher - new tool for ...

oleavr
Aug/27
CryptoShark: code tracer ba...

oleavr
Jun/25
Build a debugger in 5 minutes

More ...


Recent Blog Comments
nieo on:
Mar/22
IAT Patcher - new tool for ...

djnemo on:
Nov/17
Kernel debugger vs user mod...

acel on:
Nov/14
Kernel debugger vs user mod...

pedram on:
Dec/21
frida.github.io: scriptable...

capadleman on:
Jun/19
Using NtCreateThreadEx for ...

More ...


Imagery
SoySauce Blueprint
Jun 6, 2008

[+] expand

View Gallery (11) / Submit