Topic created on: May 23, 2008 02:44 CDT by petroleum  .
This may be a trivial matter, but since i haven't yet delved into the wonders of IDC scripting I figured asking here may be the way to go.
What i need, is to output a file with the first 0x400 bytes from the entry point of a given PE file.
This needs to be done 'en masse' for multiple files..
for example:
for %i in (directory\*) do IDA -A -SEP_bytes.idc >> output.txt
The idea is to traverse the whole of the directory, then load each PE file.. extract the first 0x400 bytes and write them to the output file.
(each input PE, would need its own output file, but that's a trivial matter)..
any suggestions?
Ty
Using IDA for that sounds a bit of an overkill. I'd use something like pefile which is gonna be probably faster and more customizable than launching IDA in batch mode just to extract a few bytes at EP.
The last usage example in this page can give you and idea on how to go about doing what you want.
|
hi,
Based on your need, i don't think there's something to do with Ida. I mean you don't have to write idc script.
Any program language can help you, parse pe->get entry point->read 0x400 bytes from there->save to file->done.
Regards,
lydia
|
this is a quick-n-dirty prg, it works with EXE/DLL and supports batch processing.
for %%A IN (*.exe, *.dll) DO dump.exe %%A
#include <stdio.h>
#include <windows.h>
main(int c, char **v)
{
#define PE_off 0x3C
#define EP_off 0x28
#define SW_BP 0xCC
#define F_pre "-dump"
#define def_sz 0x400
DWORD pe_off; DWORD ep_off; BYTE* ep_adr;
FILE *f; BYTE* base_x; char buf[_MAX_PATH];
#define arg_src (v[1])
#define arg_len ((c>2)? (atol(v[2])? atol(v[2]): def_sz): def_sz)
if (c < 2) return
printf("USAGE: dump4ep.exe filename [n_bytes]\n");
if (!((strlen(arg_src) + sizeof(F_pre)) < _MAX_PATH)) return
printf("-ERR:file name %s%s is too long\x7\n", arg_src, F_pre);
sprintf(buf,"%s%s",arg_src,F_pre);
// DONT_RESOLVE_DLL_REFERENCES flag prevents DllMain execution
// btw, VS 6 MSDN has a mistake in LoadLibraryEx description:
// DONT_RESOLVE_DLL_REFERENCE instead of DONT_RESOLVE_DLL_REFERENCES
// if (!(base_x = (BYTE*) LoadLibraryEx(arg_src,0,DONT_RESOLVE_DLL_REFERENCES)))
// LoadLibraryEx(,,DONT_RESOLVE_DLL_REFERENCES) has a nasty side-effect:
// operation system shows nag-screen every time you try to load broken PE-file,
// while LoadLibraryEx(,,LOAD_LIBRARY_AS_DATAFILE) has no this effect,
// it just returns null (error occurred).
// --------------------------------------------------------
// LOAD_LIBRARY_AS_DATAFILE works faster,
// but sets-up the lowest bit in HINSTANCE, so we have to clean it
if (!(base_x = (BYTE*) LoadLibraryEx(arg_src, 0, LOAD_LIBRARY_AS_DATAFILE))) return printf("-ERR: LoadLibrary(%s)\x7\n", arg_src);
// clean the lowest bit of the base address if necessary
if ((DWORD)base_x & 1) base_x--;
// we're supposed to check if pe_off is correct,
// but we're too lazy, so, we just call IsBadReadPtr()
// ugly hack, just to prevent exception
#define PE_OFF ((DWORD*)(base_x + PE_off))
if (IsBadReadPtr(PE_OFF, sizeof(DWORD)))
return printf("-ERR:bad PE offset\x7\n");
pe_off = *PE_OFF;
// the same ugly hack
#define EP_OFF ((DWORD*)(base_x + pe_off + EP_off))
if (IsBadReadPtr(EP_OFF, sizeof(DWORD)))
return printf("-ERR:bad EP offset\x7\n");
if (!(ep_off = *EP_OFF)) return
printf("-ERR:%s no EP!\x7\n", arg_src);
ep_adr = base_x + ep_off;
if (IsBadReadPtr(ep_adr, arg_len)) return
printf("-ERR:can't dump %d bytes\x7\n",arg_len);
// dump arg_len bytes
f = fopen(buf,"wb"); fwrite(ep_adr, 1, arg_len, f);
return 1;
}
|
|
mmm, the quick-n-dirty program (hence the name i guess ;) takes for granted the offset of the PE header... it might blow up in non-standard PE images. I'd say, go with a PE parsing library, there's a reason why they exist ;)
|
maybe he wanna use ida cuz these files are packed with something... run till find OEP with some kind of plugin and get the 400 b ... no?
sorry if i misunderstood
|
ero
you're right, man! well, I updated the dumper, it still doesn't check if EP is valid, but at least, uses IsBadReadPtr() to prevent an exception. new version is a bit faster, btw.
rakish
ok. this is it. checked on IDA 4.7.
static main()
{
auto a, f, start_ea;
start_ea = LocByName("start");
if (!start_ea)
return Message("-ERR:start label is not found");
if (!(f = fopen("dumpz","wb")))
return Message("-ERR:open file");
for (a = 0; a < 0x400; a++)
fputc(Byte(start_ea + a),f);
Warning("done"); fclose(f);
}
|
IDA is overkill.
Ero Pefile is neat here.
import pefile, sys
def main():
try:
if (len(sys.argv) < 2):
print "Please supply file!"
sys.exit();
file = sys.argv[1]
pe = pefile.PE(file,fast_load = True)
except:
print "problem loading file:", file
sys.exit();
ep = pe.OPTIONAL_HEADER.AddressOfEntryPoint
fw = open("dump.bin", "w")
fw.write(pe.get_memory_mapped_image()[ep:ep+0x400])
fw.close()
if __name__ == '__main__':
main()
|
neofx's example goes to show you the simplicity and power of python, i'm not entirely sure if his 0x400 addition to ep is checked to prevent exceptions, but either way that wouldn't be super hard to fix if it wasn't.
Here's how I did it in C:
/* author: b0ne <[email protected]>, OpenRCE.org example */
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
HANDLE pe_file;
HANDLE pe_file_map;
void *file_data;
typedef PIMAGE_NT_HEADERS WINAPI ImageNtHeader_t(PVOID ImageBase);
void *map_file(char *file_name)
{
if ((pe_file = CreateFile(file_name, GENERIC_READ,
FILE_SHARE_READ, NULL, OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL, NULL)) == INVALID_HANDLE_VALUE)
{
fprintf(stderr, "ERROR accessing %s\n", file_name);
return NULL;
}
if ((pe_file_map = CreateFileMapping(pe_file, NULL, PAGE_READONLY|SEC_IMAGE, 0, 0, NULL)) == NULL)
{
fprintf(stderr, "ERROR accessing %s\n", file_name);
CloseHandle(pe_file);
return NULL;
}
if ((file_data = MapViewOfFile(pe_file_map, FILE_MAP_READ, 0, 0, 0)) == NULL)
{
fprintf(stderr, "ERROR accessing %s\n", file_name);
CloseHandle(pe_file);
CloseHandle(pe_file_map);
return NULL;
}
return file_data;
}
void free_file(void *map_data)
{
UnmapViewOfFile(map_data);
CloseHandle(pe_file_map);
CloseHandle(pe_file);
return;
}
int is_rva_avail(char *base, DWORD rva)
{
MEMORY_BASIC_INFORMATION mbi;
char *addr = base + rva;
if (VirtualQuery(addr, &mbi, sizeof(mbi)) != sizeof(mbi))
return 0;
if (mbi.State & MEM_COMMIT)
return 1;
else
return 0;
}
int main(int argc, char *argv[])
{
char *file_name = argv[2];
char *out_name;
ImageNtHeader_t *ImageNtHeader = NULL;
HMODULE dbghelp_mod;
char *image_base;
IMAGE_NT_HEADERS *nt;
DWORD ep_rva;
DWORD ep_len;
FILE *out_file;
if (argc != 3)
{
fprintf(stderr, "usage: EPDUMP.exe <length> <target_PE_file_name>\n\n");
return EXIT_FAILURE;
}
if ((dbghelp_mod = LoadLibrary("DBGHELP.DLL")))
ImageNtHeader = (ImageNtHeader_t *) GetProcAddress(dbghelp_mod, "ImageNtHeader");
if (!ImageNtHeader)
{
fprintf(stderr, "ERROR loading DBGHELP.DLL, please install DBGHELP.DLL\n");
return EXIT_FAILURE;
}
if ((ep_len = atoi(argv[1])) < 1)
{
fprintf(stderr, "ERROR entrypoint length is too small\n");
return EXIT_FAILURE;
}
if ((image_base = map_file(file_name)) == NULL)
return EXIT_FAILURE;
if ((nt = ImageNtHeader(image_base)) == NULL)
{
fprintf(stderr, "ERROR finding NT headers in %s\n", file_name);
free_file(image_base);
return EXIT_FAILURE;
}
ep_rva = nt->OptionalHeader.AddressOfEntryPoint;
if (is_rva_avail(image_base, ep_rva) && is_rva_avail(image_base, ep_rva + ep_len))
{
if ((out_name = malloc(strlen(file_name) + 5)) == NULL)
{
fprintf(stderr, "ERROR allocating memory, free some memory and try file %s again\n", file_name);
free_file(image_base);
return EXIT_FAILURE;
}
sprintf(out_name, "%s.bin", file_name);
if ((out_file = fopen(out_name, "wb")) == NULL)
{
fprintf(stderr, "ERROR opening output file %s\n", out_name);
free_file(image_base);
FreeLibrary(dbghelp_mod);
return EXIT_FAILURE;
}
if (fwrite(image_base + ep_rva, ep_len, 1, out_file) != 1)
{
fprintf(stderr, "ERROR writing %lu bytes to output file %s\n", ep_len, out_name);
free_file(image_base);
fclose(out_file);
FreeLibrary(dbghelp_mod);
return EXIT_FAILURE;
}
fclose(out_file);
}
else
{
fprintf(stderr, "ERROR entrypoint 0x%0lx + %lu bytes is outside of IMAGE %s\n", ep_rva, ep_len, file_name);
free(out_name);
free_file(image_base);
FreeLibrary(dbghelp_mod);
return EXIT_FAILURE;
}
printf("Output File: %s\nEntrypoint RVA: 0x%0lx\nBytes: %lu\n", out_name, ep_rva, ep_len);
free(out_name);
FreeLibrary(dbghelp_mod);
free_file(image_base);
return EXIT_SUCCESS;
}
|
neoxfx
with all my respect I feel obligation to say that your way in not Zen-way :-) it hides everything under hood. kind of Windows way: run it and don't think how it works. you can't control loading process, you just use library calls - real puzzle. personally I don't know how pefile.PE(file,fast_load = True) works. does it load PE via system APIs? does it executes DllMain? btw, your example works wrong if EP = 0.
b0ne
I see no sense to use file mapping. do you think that is faster than LoadLibraryEx (LOAD_LIBRARY_AS_DATAFILE)? and dbg engine - wow!!! PE header is well documented and I don't believe that PE/EP offsets will be changed in the future.
64-bit OSes have almost the same PE-header, at least PE/EP offsets are DWORD pointers. so, why we need to use libraries to parse PE-structure?! parsing PE-header by our own hands we always can open Microsoft Portable Executable and Common Object File Format Specification to be sure that everything is correct. we can't rely on 3rd parties libraries.
|
Isn't a general rule of thumb when programming to always use symbols instead of magic constants for that very reason?
It would really be just as easy to cast the base address to (IMAGE_DOS_HEADER *) and access the e_lfanew member, add that to the image base and assign it to the IMAGE_NT_HEADER pointer.
File mapping is far more eloquent than LoadLibrary + fixup hacks which may not be reliable in the future. As for the performance, the windows loader uses file mapping, so why not cut out all the "crap" that sits on top if we just want to map the file into memory?
|
with all my respect I feel obligation to say that your way in not Zen-way :-) it hides everything under hood. kind of Windows way: run it and don't think how it works. you can't control loading process, you just use library calls - real puzzle. personally I don't know how pefile.PE(file,fast_load = True) works.
Ero's pefile is open source; feel free to read it and find out how it works (it's purely static).
does it load PE via system APIs? does it executes DllMain? btw, your example works wrong if EP = 0.
The answers are "no" and "no". Please explain how the answer is wrong for EP=0.
I see no sense to use file mapping. do you think that is faster than LoadLibraryEx (LOAD_LIBRARY_AS_DATAFILE)? and dbg engine - wow!!!
Speed is not important in a one-off task such as this, but I'd bet money it is faster than LoadLibraryEx, due to the fact that LoadLibraryEx maps the file into memory itself plus does a bunch of other work.
PE header is well documented and I don't believe that PE/EP offsets will be changed in the future.
64-bit OSes have almost the same PE-header, at least PE/EP offsets are DWORD pointers. so, why we need to use libraries to parse PE-structure?! parsing PE-header by our own hands we always can open Microsoft Portable Executable and Common Object File Format Specification to be sure that everything is correct. we can't rely on 3rd parties libraries.
Actually that "3rd-party library" is Microsoft's own debugging support library, so you can be pretty sure it's at least as good as whatever you come up with by hand... that said I probably would have just used casts to Windows' internally-defined data structures in b0ne's C example.
|
BegPardon
> Ero's pefile is open source; feel free to read it and find out how it works (it's purely static).
nolo contendere! Ero's pefile is a great stuff, this is not a debatable question!
I just wanted to point out that parsing PE-file is very simple task, or very trickily at the same time. guess, the file has relocations, designed not to rebase it, but to patch some byte to make reverse harder. using LoadLibraryEx we can force system to load file _with_ relocations or _without_ them.
and besides, there are so many wrappers, libraries, layers of abstraction... I prefer use "bare" win32 API instead of bunch of something I have to learn to... speed up my job? I doubt. anyway, I showed how to load PE-file and dump bytes from EP, using win32 API.
personally I don't like Python very much.
> Please explain how the answer is wrong for EP=0.
if EP = 0 the follow code doesn't report about error and dumps 0x400 byes from the beginning of the file. many DLL files have EP == 0.
ep = pe.OPTIONAL_HEADER.AddressOfEntryPoint
fw = open("dump.bin", "w")
fw.write(pe.get_memory_mapped_image()[ep:ep+0x400])
> Speed is not important in a one-off task such as this,
using LoadLibraryEx makers your code shorter, keeping the same speed. so, what's the reasons to use memory mapping?!
> it is faster than LoadLibraryEx,
> due to the fact that LoadLibraryEx maps the file into memory
> itself plus does a bunch of other work.
not with LOAD_LIBRARY_AS_DATAFILE flag
> Actually that "3rd-party library" is Microsoft's own debugging support library,
I did mean Ero's pefile, and ms dbg engine changeable and you have to download it (not everybody uses it, so this is definitely bad idea to ask ms dbg engine where the EP is, it's too expensive, better to write a couple lines code, working everywhere)
b0ne
> Isn't a general rule of thumb when programming to always use
> symbols instead of magic constants for that very reason?
well, if you don't like "magic", define a structure, but in our case, we just need to get only two offsets, so I see no reasons to use PE-structures, and besides, there is no "magic", I defined the offset of used fields.
> File mapping is far more eloquent than LoadLibrary + fixup hacks
where you see "fixup hacks"?! this is not "hack", this is well-documented way to load PE file as database.
> which may not be reliable in the future.
according to whom?! I can't imagine that LoadLibraryEx will stop working, at the same time... I'm not sure about direct file mapping.
> As for the performance, the windows loader uses file mapping,
> so why not cut out all the "crap" that sits on top
> if we just want to map the file into memory?
actually, that "crap" is quite thin, but using LoadLibraryEx provides more flexible control and simplifies your code a lot. I don't understand, why you insist that manual mapping is a better way?!
|
@nezumi
>if EP = 0 the follow code doesn't report about error and dumps 0x400 byes from the beginning of the file. many DLL files have EP == 0.
when ep=0, it dumps from start of imagebase(which happens to be start of file).
its easy to just add a handler statement like [feel free to add debug stats:-)],
"if ep == 0: print 'no EP'" and return
however, your code will break when the VA from entry point to dump length is discontinuous.
try your code with any packed file with EP section having small ep stub and with "sect_virtual_size > sect_physical_size". [i.e. ep+0x400 should go beyond current section and that Vitual size of curr section is bigger than physical size]
take upx packed files for example, this is a common case.
you get, "-ERR:can't dump 1024 bytes"
sometimes it is simpler to operate with fileoffsets(without involving loader, pefile tries to mimic everything statically) :-), however agreed that both the methods have pros and cons.
|
Note: Registration is required to post to the forums.
|