This weekend, I looked into upgrade the Python script I had made previously for reading data from a Windows executable. The executable format for a Windows executable is PE/COFF (Portable Executable & Common Object File Format). The specification can be found at Microsoft Learn as of 2022, prior to that i was available as a download as either docx or pdf.

The original purpose of the script I Wrote was to extract the image assets (bitmaps) from a Windows game released in 1991 called SkiFree. After I got that to work I started looking at the import table which gives what DLLs are loaded when the executable is loaded and what functions in the DLL are referred to.

The two changes were, the original script was written for Python 2 which was quite minor and the largest change was the new version of the third party library I used called construct had a massive API change. The construct library allows a developer to declare the file format in Python and it takes care of parsing it from that declaration.

When I had first written the script I declared everything myself using library. It was not until after getting everything working that I discovered that the library source repository contained an example definition for PE/COFF.

For my Python 3 version I decided to start from scratch and utilise the example definition this time around which handles the basics (the initial headers and generic parts) of the format. This meant I could focus on my unique stuff.

At the end of the day, I managed to recreate the script that was able to:

  • Extracts the bitmaps from SkiFree
  • List the DLLs and their functions used by SkiFree
  • Extend it to list DLLs and functions used by 64-bit Notepad. More on this in a bit
  • Began work on getting the exported functions - for if you give it a dynamic link library rather than program.

64-bit Support

It was not looking promising initially because while it worked on SkiFree when I pointed it to Notepad it failed. This turned out to come down to the following note about relative virtual address (RVA).

The RVA of an item almost always differs from its position within the file on disk (file pointer).

My original test case happened to fall into the case not covered by the almost as such I was able to treat the RVA as the position within the file. The fix is to find the section that the RVA refers to and apply the following to:

   file_offset = rva - section.virtual_address + section.rawdata_pointer

Improvements to Make

There are two gotchas that I really hope I can address and find a better solution for assuming the library supports it.

The first is RepeatUntil() includes the terminating/sentinel entry, at the moment after parsing it I pop the last element before using it.

The second is using Seek() returns the number of bytes skipped in the parsed data structure rather than ignoring them.

A third improvement I would like to make which is not a gotcha with the library is my use of the RVA to file offset work. The idea is to see if its possible to use section.rawdata instead of having to pass around the entire file’s data and compute it from that. section.rawdata already points to data after section.rawdata_pointer.

The next follow up would then allow me to replace patching the name field after parsing with ideally a pointer to CString within at least the Import Directory Table.

So where there is currently name RVA (or name_address) in my code, I can also have the name which points straight to the corresponding name.

Past

The last time I was working on this I made a browser-based version using JavaScript and the third party library binary-parser. The parsing of the executable is handled client side.

The web version has only been tested with SkiFree and expects bitmaps to be present.

New Feature

The new version has a verbose output for printing information about the imports. It includes the ordinal and the same metadata as seen in Visual Studio tool dumpbin when the /export flag is given.

This is subset of the output for skifree.exe showing a single imported library.

GDI32.dll
            A000 Import Address Table
            A924 Import Name Table
               0 time date stamp
               0 Index of first forwarder reference

              53 DeleteObject
             1C7 SelectObject
             16E GetTextExtentPoint32A
              11 BitBlt
             194 PatBlt
              29 CreateCompatibleBitmap
             205 TextOutA
             15F GetStockObject
             125 GetDeviceCaps
             14F GetObjectA
              50 DeleteDC
              24 CreateBitmap
             175 GetTextMetricsA
              2A CreateCompatibleDC