I'm trying to enhance an existing processor module by hooking the necessary callbacks (custom_ana, custom_emu,..out etc) in a plugin module. Now I could use a little clarification and/or tips with doing this.
The final question is:
Is there an easy way to track if the instruction being decoded in ana() is decoded for the first time and if it's in the correct program flow?
I�m sure this question is a bit vague so below is some background information (warning long story)
Goal
I want to track 1 of the cpu's registers based on if the instruction is interesting or not (does it modifies the register I want to track?). This tracking has to be done in the ana() function because I also want to change "cmd" if interesting and from reading the comments in de SDK, the emu() function isn't allowed to change "cmd". It is important that the register is tracked in the correct flow of the program. What is �correct�? Read on.
Scenario
Given the following instructions
0x0000 instruction A with 1 operand (Op1 has xref to 0x000B)
0x0001 instruction B
0x0002 instruction C (jump) 0x000B
...
0x000B instruction DWhere we call the interesting register I want to track reg X with initial value 2
Instruction A would add 2 to reg X (this should be done in the ana() function)
Instruction B would decrease reg X by 1
Instruction D multiplies reg X by 4
The original (and �correct�) program flow would be 0x0000, 0x0001, 0x0002, 0x000B where regX should change like this:
Start regX = 2
Execute instruction A �> regX = 4
Execute instruction B -> regX = 3
Execute instruction C -> regX = 3
Execute instruction D -> regX = 12
live problem
What I actually experience is:
Start regX = 2
Execute instruction A -> regX = 4
Execute instruction D -> regX = 16
Execute instruction B -> regX = 15
Execute instruction C
Etc.
So I started tracking the order of the functions being called by IDA to figure out what was causing this. As far as I understand the way IDA analyzes code is in this order:
call ana() decode instruction and fill cmd (determine itype, type, dtyp, size etc)
call emu() create xrefs
call out() print mnemonic & operand(s)
If I'm correct the complete process of analyzing an instruction at address X is not fully done until out() is called, right?
What I see happening in my live problem is:
Ana (0x0000) regX = 4
Emu (0x0000) // Now here it detects that the first operand should have a XREF to 0x000B
// and calls something like ua_add_dref(x,x,x)
Ana (0x000B) regX = 16 // I believe that the ua_add_dref(x,x,x) function triggers the Ana() function again causing regX to change while it shouldn�t (the original flow of the program wouldn�t have executed instruction D yet).
Emu (0x000B)
Ana (0x0000) // I�m not 100% sure this one was called again before out() but I can�t access the code&binary from home so I�m doing this out of my head.
Out (0x0000)Within ana() I can track whether the instruction at address X is decoded for the first time by logging it�s address in e.g. a vector and do a lookup. Based on this I could decide if I should change regX or not. What I can�t seem to do is identify if the instruction being decoded is actually decoded for the first time AND in the correct flow of the program.
As you saw above 0x000B gets decoded for the first time before it would have actually been had it followed the original program flow. So here�s my question again:
Is there an easy way to track if the instruction being decoded in ana() is decoded for the first time and if it's in the correct program flow?
Thanks for reading this long post :)






