Exporting data symbols in C# for NVIDIA Optimus

This is kind of a neat hack I ran across: At work were talking about if it was possible to export a native Win32 data symbol from a .NET application. The reason you would want to do this is because there is a method of controlling NVIDIA Optimus by doing so. Optimus tries to guess if you application needs the high performance GPU or not, but sometimes it guesses wrong. If you are a game developer you can force this by exporting a symbol like this from your binary (more info here):

extern "C" {
_declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001;
}

That way when the driver loads it can check for this symbol and the value. I think the reason it's done this way is because calling in API would have to done after the driver is loaded, which is too late to choose which driver you are going to use (Intel vs. NVIDIA)...although this not an "official professional opinion". I didn't actually ask anyone at work so don't quote me. 🙂

The problem is you can't really do this with C# because there is no way of exporting a static integer. In fact there's no way of exporting anything. There is this NuGet package called UnmanagedExports however, it only lets you export symbols from a DLL, and only methods. Unfortunately for Optimus to detect the symbol it needs to be exported by the EXE itself. It's a bit unusual for EXEs to have exports in Win32, but not invalid.

So how does UnmanagedExports work exactly? Well it turns out in .NET you can actually roundtrip between a compiled binary and MSIL assembly pretty easily. Using the ildasm tool you can disassemble your combiled assembly into a .il file and .res file, edit them and then re-assemble them with ilasm. It also turns out that MSIL has a .export keyword you can place on methods to export them but the C# has no way of generating that opcode for some reason.

However, if we want to get creative, we can have a post-build step that disassembles our EXE, finds so specially named function (NvOptimusEnablementExporter_DontCallThis), inserts the attribute and then re-assembles the binary. UnmanagedExports adds some other stuff to make sure the calling conventions and marshalling code is correct to call the function from native code but we don't care about that since nobody is going to call this.

MSIL Before and After

MSIL Before and After

Ok, so that works. But now the problem is: we are exporting code, not data. Well in Win32 there isn't actually a difference. The export just signifies and address in the loaded binary that could be either depending on what the code expects. Our address definitely doesn't contain the right DWORD to control the Optimus state, it actually has a jmp into some code to initialize the .NET runtime. But that's OK, we can fix that.

In C# a class can have a static constructor, basically a function that runs on startup. We can create one that finds the address the symbol is pointing to and then overwrites the code the compiler has written there. This would cause a crash if anyone calls the function, but nobody does. The memory is also marked as read-only since it contains executable code, so first we need to make it writable. This all happends using native non-.NET APIs so we also need to get the native module handle of our assembly. Luckily we can assume the name of our DLL is the same as our assembly since we know we aren't running out of some run-time generated module.

class NvOptimusEnablementExporter
{
  static uint NvOptimusEnablement = 1;

  const uint PAGE_READWRITE = 0x04;
  [DllImport("kernel32.dll", SetLastError = true)]
  static extern bool VirtualProtect(IntPtr lpAddress, uint dwSize, uint flNewProtect, 
   out uint lpflOldProtect);

  [DllImport("kernel32.dll", CharSet = CharSet.Ansi, ExactSpelling = true, SetLastError = true)]
  static extern IntPtr GetProcAddress(IntPtr hModule, string procName);

  const uint GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT = 0x2;
  [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
  static extern bool GetModuleHandleEx(uint dwFlags, string lpModuleName, out IntPtr phModule);

  static NvOptimusEnablementExporter()
  {
    Assembly thisAssembly = Assembly.GetExecutingAssembly();
    IntPtr myNativeModuleHandle = IntPtr.Zero;
    if (GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, 
        thisAssembly.ManifestModule.Name, out myNativeModuleHandle))
    {
      IntPtr nvExportAddress = GetProcAddress(myNativeModuleHandle, "NvOptimusEnablement");
      if (nvExportAddress != IntPtr.Zero)
      {
        uint oldProtect = 0;
        //make it writable 
        if (VirtualProtect(nvExportAddress, 4, PAGE_READWRITE, out oldProtect))
        {
          unsafe
          {
            uint* dwordValuePtr = (uint*)nvExportAddress.ToPointer();
            //overwrite code that will never be called with the dword the driver is looking for
            *dwordValuePtr = NvOptimusEnablement;
          }
          VirtualProtect(nvExportAddress, 4, oldProtect, out oldProtect);
        }
      }
      else
      {
        Console.Error.WriteLine("You didn't hack the MSIL output!");
      }
    }

  }

  //This magic name is found by regex in the post-build step
  private static void NvOptimusEnablementExporter_DontCallThis()
  {

  }
}

This needs to happen before the display driver DLL is loaded and queries the symbol, but that should usually be the case since the DLL is loaded when you create D3D device or OpenGL context and not a startup.

I haven't actually tested beyond verifying the export is correct since I don't have an Optimus system on hand, but if anyone is interested the code is here: https://github.com/lmagder/OptimusEnablementNET

-Lucas

Well, that’s a terrible idea!

A few weeks ago a coworker and I were complaining about how annoying it is to make self contained tools and utilities. It seems like everything depends on 1000 DLLs these days. All our engine is statically linked because a) Is anybody really going to build off a binary-only version and then want to upgrade to a new version without recompiling? People tout this advantage a lot, but honestly I'd like to see somebody do that with non-trivial C++ code. And b) then you can't statically link the CRT anymore (since each DLL would get it's heap and you need to remember who allocated what pointer, etc.) which means you need a CRT redistributable to be installed on PCs that run your software. Microsoft has this local manifest thing for jamming at that crazy WinSxS goodness/badness into your folder but you're still stuck with a folder of crap.

Even then, sometimes you have the source for your dependencies and sometimes you don't... and even sometimes when you do it's so annoying to build them that you would go to extreme and terrible lengths to use the binaries you are given. But I'm not naming any names here :). Also there can be sub dependencies that are binaries and require certain a CRT type (debug/release/DLL/static). The Microsoft compiler won't let you combine two CRTs into one binary unfortunately, but if it did this would work. Previously you couldn't have malloc-ed in one and freed in another anyway, it just doesn't know that you know you can't do that. Also there would be symbol clashes since there would be two "malloc" functions so you would need some kind of pre-link symbol-binding-then-wiping pass:

clip_image002

Unfortunately the green stuff doesn't exist so that leaves us with a few other options:

Pack the DLLs in a resource and extract them to a temp directory

This is the most straightforward and self-explanatory approach but it means you can't use DLLs that aren't dynamically loaded since your unpacking code hasn't run yet when Windows tries to find your dependencies. (Yes, you can also delay-load, I'm getting to that!) Also this lame because...well writing stuff to disk when you shouldn't have to is lame and also because it's basically impossible to pollute the system with more copies of the DLLs each time you are run. How does your app delete the DLLs after it exits? There is the FILE_FLAG_DELETE_ON_CLOSE flag but that requires you use FILE_SHARE_DELETE and that's not allowed for a loaded module on Windows.

Use delay-loaded DLLs

There is a special linker option that tells the linker to auto-generate stubs around all the call sites into a DLL which attempt to do a LoadLibrary/GetProcAddress sequence to just-in-time find the address of the import. Ok great! This gives us a chance to pre-load the DLLs from the temp folder right before we start doing anything and could possibly call a function from the DLL. This works because Windows only uses the name of the DLL and not the path to check if the DLL is loaded.

Still, it would be really nice to avoid having to leak those temporary files (and also writing to disk is for losers). Enter __pfnDliNotifyHook2, this is a user callback from the Microsoft CRT that allows you to customize the code used to locate the delay loaded functions. So if we just go LoadLibraryFromMemory we would be all set! Unfortunately that function doesn't exist. There is no way of loading a module in Windows except from a file on disk. I think this is actually because of a weird 16-bit Windows legacy choice where Windows doesn't actually write pages from executables to the swap file and instead loads them from the file again directly when needed (or at least it used to do that back in the day). But for whatever reason you can't mess with the files of loaded executables on Windows unlike other OSes.

If we want to keep going down this path we need to parse the PE file format ourselves and write our own implementation of Windows DLL loading code. This sounds pretty sketchy but people out there have done it (see https://github.com/fancycode/MemoryModule) but that has the weird effect that the DLL isn't actually "loaded" as far as Windows is concerned since the code can't update any of the OS-internal structures. Also I would be worried about shipping this code. What if something inside the Windows PE loader changes and our code goes out of sync? But since we are parsing all the headers already this leads to option #3...

Somehow turn the compiled DLL into static library and link against that

This sounds totally crazy at first glance but when I checked out the spec for the PE files (the format of DLL and EXE files) and COFF files (the format of .obj files and by extension the contents of .lib files) I noticed that the formats are actually pretty similar (they are even defined by a single specification document).

Theoretically all the information you need to run the program is in the DLL so it should be possible to extract the binary code and data from the DLL and repack it into a static library. There would be no information about symbols only used internally in the module but that's OK, in fact it's even good since those duplicate symbols are the reason you can't normally link together mismatched object files.

With this information I decided to try and write a tool that does this, honestly mostly just for the hell of it. It's called "DLLMasher" and yes it actually does work (with a few caveats) and I learned a lot of interesting stuff about how Windows binaries work along the way.

So next time: How I went about writing DLLMasher and how they hell does it even work at all. For the impatient you can find the source here: https://github.com/lmagder/DLLMasher