peparser - New construct/Python
This weekend, I looked into upgrade the Python script I had made previously for reading data from a Windows executable. The executable format for a Windows executable is PE/COFF (Portable Executable & Common Object File Format). The specification can be found at Microsoft Learn as of 2022, prior to that i was available as a download as either docx or pdf.
The original purpose of the script I Wrote was to extract the image assets (bitmaps) from a Windows game released in 1991 called SkiFree. After I got that to work I started looking at the import table which gives what DLLs are loaded when the executable is loaded and what functions in the DLL are referred to.
The two changes were, the original script was written for Python 2 which was quite minor and the largest change was the new version of the third party library I used called construct had a massive API change. The construct library allows a developer to declare the file format in Python and it takes care of parsing it from that declaration.
When I had first written the script I declared everything myself using library. It was not until after getting everything working that I discovered that the library source repository contained an example definition for PE/COFF.
For my Python 3 version I decided to start from scratch and utilise the example definition this time around which handles the basics (the initial headers and generic parts) of the format. This meant I could focus on my unique stuff.
At the end of the day, I managed to recreate the script that was able to:
- Extracts the bitmaps from SkiFree
- List the DLLs and their functions used by SkiFree
- Extend it to list DLLs and functions used by 64-bit Notepad. More on this in a bit
- Began work on getting the exported functions - for if you give it a dynamic link library rather than program.
64-bit Support
It was not looking promising initially because while it worked on SkiFree when I pointed it to Notepad it failed. This turned out to come down to the following note about relative virtual address (RVA).
The RVA of an item almost always differs from its position within the file on disk (file pointer).
My original test case happened to fall into the case not covered by the almost as such I was able to treat the RVA as the position within the file. The fix is to find the section that the RVA refers to and apply the following to:
file_offset = rva - section.virtual_address + section.rawdata_pointer
Improvements to Make
There are two gotchas that I really hope I can address and find a better solution for assuming the library supports it.
The first is RepeatUntil() includes the terminating/sentinel entry, at the moment after parsing it I pop the last element before using it.
The second is using Seek() returns the number of bytes skipped in the parsed data structure rather than ignoring them.
A third improvement I would like to make which is not a gotcha with the library
is my use of the RVA to file offset work. The idea is to see if its possible to
use section.rawdata instead of having to pass around the entire file’s data
and compute it from that. section.rawdata already points to data after
section.rawdata_pointer.
The next follow up would then allow me to replace patching the name field after parsing with ideally a pointer to CString within at least the Import Directory Table.
So where there is currently name RVA (or name_address) in my code, I can also have the name which points straight to the corresponding name.
Past
The last time I was working on this I made a browser-based version using JavaScript and the third party library binary-parser. The parsing of the executable is handled client side.
The web version has only been tested with SkiFree and expects bitmaps to be present.
New Feature
The new version has a verbose output for printing information about the imports.
It includes the ordinal and the same metadata as seen in Visual Studio tool
dumpbin when the /export flag is given.
This is subset of the output for skifree.exe showing a single imported library.
GDI32.dll
A000 Import Address Table
A924 Import Name Table
0 time date stamp
0 Index of first forwarder reference
53 DeleteObject
1C7 SelectObject
16E GetTextExtentPoint32A
11 BitBlt
194 PatBlt
29 CreateCompatibleBitmap
205 TextOutA
15F GetStockObject
125 GetDeviceCaps
14F GetObjectA
50 DeleteDC
24 CreateBitmap
175 GetTextMetricsA
2A CreateCompatibleDC