Lucas Magder: I've been told not to call this a "blog" anymore

Trust people when they say “write games not engines”

By Lucas on July 27, 2017

...unless you really like writing engines, which I do.

I never got around to posting about this project since I was kind of waiting for it to be something cooler, but a couple of years in of working on it once in a while the amount of work left kind of hit me. In the depths of my broken animation blending code I was like "wait, what am I even doing here? this is not even close to being a dungeon crawler" So behold: "Frosting", another one thrown on the pile of unfinished engines!

The genesis of this project started back before I had spent time with Unreal and had only really worked on the NV-in-house engine and Cryptic's engine. The NV engine or "NVDemo" as we called it had pretty advanced rendering features since our projects were maybe 85% rendering/shader code, 10% menus, 10% demo/attract mode logic, but it didn't have much in the way of anything else and it was geared towards small scenes, so no real streaming, networking, game logic, etc.

I happened to come across an article about the high-level design of the Unreal 1 networking architecture (since eaten by the internet) and thought "wow this is a whole other can of worms than what I've worked on so far!" also I found this article on draw call management Order your graphics draw calls around! Doing it exactly like this doesn't make much sense in the era of Vulkan and friends, but at the time it seemed fancy. Plus, I didn't like this thing, or that thing with our system at work and wanted to try my crack at a clean slate.

My goal was to build a realtime hack-and-slash rougelike, but mostly I just got a bunch of test levels and crates you can knock over 🙂 Since I knew the least about the non-graphics stuff I started with that which is why the rendering is so basic, no shadows, directional light etc. Otherwise I might accidentally make a rendering engine again and end up with no gameplay. I didn't really get that far but it had some neat features:

Serialization and replication/diffing of C++ objects without running a custom preprocessor, mostly as a challenge after using NVDemo and UE
Command buffer-ish rendering architecture
Fancy GUI editor "Cobbler" (everything is dessert themed for some reason I can't remember) with full undo/redo using said serialization system
Reliable unicast/multicast RPC method calls, and reliable/unreliable property replication
Network games that work both with a headless dedicated server, and player server
Bullet for physics, which also replicate (mostly because it's not NVIDIA PhysX...learning stuff!)
Play-in-editor mode. the editor is a server you can connect to. When not PIE you're a spectator, watch designers place things! (ok this feature basically came for free)
Streaming and hot reloading everything textures, models, animations, prefab definitions, fonts, shaders
Generic sockets, skeletons. Parent components to sockets on skeletal meshes, physics bodies. Attach random stuff together!

Editing while spectating as through a game client

Playing in editor and selecting the player actor

Browsing for a prefab to attach to a spawner object in the level

But who knows? Maybe one day it might get a real lighting system and a D3D11 -> Vulkan port.

Posted in Programming | Tagged Frosting

Incase you were wondering, the moon landings were not actually faked

By Lucas on September 19, 2014

My most recent project just got revealed! We set out to recreate the Apollo 11 landing site and specifically the iconic photo on Buzz Aldrin stepping off from the LEM inside Unreal Engine 4 using NVIDIA's upcoming global illumination middleware called VXGI and our new Maxwell GPU architecture.

It was actually a really interesting project on multiple fronts since I got to help out with the UE4 renderer integration, working on the lighting tech itself, and also on recreating the moon surface.

We also shot a video about the whole process and the different stuff we looked into, which was fun, although I don't think I have a great future as a TV presenter 🙂

In addition to the moon stuff we also worked on another scene in our own engine as kind of proof of concept based on an enhanced version of the San Miguel test data, which turned out pretty nice too:

Posted in Programming, Work | Tagged Apollo11

FaceWorks is on the Play Store

By Lucas on June 5, 2014

The FaceWorks Tegra K1 demo I worked on finally made it to the play store! If you have a K1 device give it a shot

Installs are pretty low still since you need an Android (sorry, no Jetsons) Tegra K1 device that supports OpenGL 4.4 non-ES and I think the only one out there so far is the XiaoMi MiPad. I hear China is apparently not so big on the Play Store.

Posted in Programming, Work | Tagged Faceworks

Exporting data symbols in C# for NVIDIA Optimus

By Lucas on May 18, 2014

This is kind of a neat hack I ran across: At work were talking about if it was possible to export a native Win32 data symbol from a .NET application. The reason you would want to do this is because there is a method of controlling NVIDIA Optimus by doing so. Optimus tries to guess if you application needs the high performance GPU or not, but sometimes it guesses wrong. If you are a game developer you can force this by exporting a symbol like this from your binary (more info here):

extern "C" {
_declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001;
}

That way when the driver loads it can check for this symbol and the value. I think the reason it's done this way is because calling in API would have to done after the driver is loaded, which is too late to choose which driver you are going to use (Intel vs. NVIDIA)...although this not an "official professional opinion". I didn't actually ask anyone at work so don't quote me. 🙂

The problem is you can't really do this with C# because there is no way of exporting a static integer. In fact there's no way of exporting anything. There is this NuGet package called UnmanagedExports however, it only lets you export symbols from a DLL, and only methods. Unfortunately for Optimus to detect the symbol it needs to be exported by the EXE itself. It's a bit unusual for EXEs to have exports in Win32, but not invalid.

So how does UnmanagedExports work exactly? Well it turns out in .NET you can actually roundtrip between a compiled binary and MSIL assembly pretty easily. Using the ildasm tool you can disassemble your combiled assembly into a .il file and .res file, edit them and then re-assemble them with ilasm. It also turns out that MSIL has a .export keyword you can place on methods to export them but the C# has no way of generating that opcode for some reason.

However, if we want to get creative, we can have a post-build step that disassembles our EXE, finds so specially named function (NvOptimusEnablementExporter_DontCallThis), inserts the attribute and then re-assembles the binary. UnmanagedExports adds some other stuff to make sure the calling conventions and marshalling code is correct to call the function from native code but we don't care about that since nobody is going to call this.

MSIL Before and After

Ok, so that works. But now the problem is: we are exporting code, not data. Well in Win32 there isn't actually a difference. The export just signifies and address in the loaded binary that could be either depending on what the code expects. Our address definitely doesn't contain the right DWORD to control the Optimus state, it actually has a jmp into some code to initialize the .NET runtime. But that's OK, we can fix that.

In C# a class can have a static constructor, basically a function that runs on startup. We can create one that finds the address the symbol is pointing to and then overwrites the code the compiler has written there. This would cause a crash if anyone calls the function, but nobody does. The memory is also marked as read-only since it contains executable code, so first we need to make it writable. This all happends using native non-.NET APIs so we also need to get the native module handle of our assembly. Luckily we can assume the name of our DLL is the same as our assembly since we know we aren't running out of some run-time generated module.

class NvOptimusEnablementExporter
{
  static uint NvOptimusEnablement = 1;

  const uint PAGE_READWRITE = 0x04;
  [DllImport("kernel32.dll", SetLastError = true)]
  static extern bool VirtualProtect(IntPtr lpAddress, uint dwSize, uint flNewProtect, 
   out uint lpflOldProtect);

  [DllImport("kernel32.dll", CharSet = CharSet.Ansi, ExactSpelling = true, SetLastError = true)]
  static extern IntPtr GetProcAddress(IntPtr hModule, string procName);

  const uint GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT = 0x2;
  [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
  static extern bool GetModuleHandleEx(uint dwFlags, string lpModuleName, out IntPtr phModule);

  static NvOptimusEnablementExporter()
  {
    Assembly thisAssembly = Assembly.GetExecutingAssembly();
    IntPtr myNativeModuleHandle = IntPtr.Zero;
    if (GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, 
        thisAssembly.ManifestModule.Name, out myNativeModuleHandle))
    {
      IntPtr nvExportAddress = GetProcAddress(myNativeModuleHandle, "NvOptimusEnablement");
      if (nvExportAddress != IntPtr.Zero)
      {
        uint oldProtect = 0;
        //make it writable 
        if (VirtualProtect(nvExportAddress, 4, PAGE_READWRITE, out oldProtect))
        {
          unsafe
          {
            uint* dwordValuePtr = (uint*)nvExportAddress.ToPointer();
            //overwrite code that will never be called with the dword the driver is looking for
            *dwordValuePtr = NvOptimusEnablement;
          }
          VirtualProtect(nvExportAddress, 4, oldProtect, out oldProtect);
        }
      }
      else
      {
        Console.Error.WriteLine("You didn't hack the MSIL output!");
      }
    }

  }

  //This magic name is found by regex in the post-build step
  private static void NvOptimusEnablementExporter_DontCallThis()
  {

  }
}

This needs to happen before the display driver DLL is loaded and queries the symbol, but that should usually be the case since the DLL is loaded when you create D3D device or OpenGL context and not a startup.

I haven't actually tested beyond verifying the export is correct since I don't have an Optimus system on hand, but if anyone is interested the code is here: https://github.com/lmagder/OptimusEnablementNET

-Lucas

Posted in Programming, Work

Digital Ira

By Lucas on November 26, 2013

fxguide just posted a pretty extensive article about the state of the art in face rendering: "The Art of Digital Faces at ICT – Digital Emily to Digital Ira" and it has section about the NVIDIA Digital Ira/FaceWorks project I worked on.

We've actually made two different version of Digital Ira so far. The "super-duper" version that runs on a GTX Titan (which you can even download here if you happen to own one):

If you scroll down you might notice that I also did part of the Dawn demo that get's skewered at the beginning. <shrug> New and pretty becomes old and busted pretty fast with this stuff. 🙂

We also made a mobile version using the same data set, but with a simplified shading model. Mobile chips have come a long way since back in the day. Plus handwriting NEON assembly is fun.

It's not downloadable yet since there are no devices in the wild yet that could run it, but it went over pretty well at Siggraph so hopefully it makes it out there eventually.

Posted in Programming, Work | Tagged Faceworks

Well, that’s a terrible idea!

By Lucas on August 5, 2012

A few weeks ago a coworker and I were complaining about how annoying it is to make self contained tools and utilities. It seems like everything depends on 1000 DLLs these days. All our engine is statically linked because a) Is anybody really going to build off a binary-only version and then want to upgrade to a new version without recompiling? People tout this advantage a lot, but honestly I'd like to see somebody do that with non-trivial C++ code. And b) then you can't statically link the CRT anymore (since each DLL would get it's heap and you need to remember who allocated what pointer, etc.) which means you need a CRT redistributable to be installed on PCs that run your software. Microsoft has this local manifest thing for jamming at that crazy WinSxS goodness/badness into your folder but you're still stuck with a folder of crap.

Even then, sometimes you have the source for your dependencies and sometimes you don't... and even sometimes when you do it's so annoying to build them that you would go to extreme and terrible lengths to use the binaries you are given. But I'm not naming any names here :). Also there can be sub dependencies that are binaries and require certain a CRT type (debug/release/DLL/static). The Microsoft compiler won't let you combine two CRTs into one binary unfortunately, but if it did this would work. Previously you couldn't have malloc-ed in one and freed in another anyway, it just doesn't know that you know you can't do that. Also there would be symbol clashes since there would be two "malloc" functions so you would need some kind of pre-link symbol-binding-then-wiping pass:

clip_image002

Unfortunately the green stuff doesn't exist so that leaves us with a few other options:

Pack the DLLs in a resource and extract them to a temp directory

This is the most straightforward and self-explanatory approach but it means you can't use DLLs that aren't dynamically loaded since your unpacking code hasn't run yet when Windows tries to find your dependencies. (Yes, you can also delay-load, I'm getting to that!) Also this lame because...well writing stuff to disk when you shouldn't have to is lame and also because it's basically impossible to pollute the system with more copies of the DLLs each time you are run. How does your app delete the DLLs after it exits? There is the FILE_FLAG_DELETE_ON_CLOSE flag but that requires you use FILE_SHARE_DELETE and that's not allowed for a loaded module on Windows.

Use delay-loaded DLLs

There is a special linker option that tells the linker to auto-generate stubs around all the call sites into a DLL which attempt to do a LoadLibrary/GetProcAddress sequence to just-in-time find the address of the import. Ok great! This gives us a chance to pre-load the DLLs from the temp folder right before we start doing anything and could possibly call a function from the DLL. This works because Windows only uses the name of the DLL and not the path to check if the DLL is loaded.

Still, it would be really nice to avoid having to leak those temporary files (and also writing to disk is for losers). Enter __pfnDliNotifyHook2, this is a user callback from the Microsoft CRT that allows you to customize the code used to locate the delay loaded functions. So if we just go LoadLibraryFromMemory we would be all set! Unfortunately that function doesn't exist. There is no way of loading a module in Windows except from a file on disk. I think this is actually because of a weird 16-bit Windows legacy choice where Windows doesn't actually write pages from executables to the swap file and instead loads them from the file again directly when needed (or at least it used to do that back in the day). But for whatever reason you can't mess with the files of loaded executables on Windows unlike other OSes.

If we want to keep going down this path we need to parse the PE file format ourselves and write our own implementation of Windows DLL loading code. This sounds pretty sketchy but people out there have done it (see https://github.com/fancycode/MemoryModule) but that has the weird effect that the DLL isn't actually "loaded" as far as Windows is concerned since the code can't update any of the OS-internal structures. Also I would be worried about shipping this code. What if something inside the Windows PE loader changes and our code goes out of sync? But since we are parsing all the headers already this leads to option #3...

Somehow turn the compiled DLL into static library and link against that

This sounds totally crazy at first glance but when I checked out the spec for the PE files (the format of DLL and EXE files) and COFF files (the format of .obj files and by extension the contents of .lib files) I noticed that the formats are actually pretty similar (they are even defined by a single specification document).

Theoretically all the information you need to run the program is in the DLL so it should be possible to extract the binary code and data from the DLL and repack it into a static library. There would be no information about symbols only used internally in the module but that's OK, in fact it's even good since those duplicate symbols are the reason you can't normally link together mismatched object files.

With this information I decided to try and write a tool that does this, honestly mostly just for the hell of it. It's called "DLLMasher" and yes it actually does work (with a few caveats) and I learned a lot of interesting stuff about how Windows binaries work along the way.

So next time: How I went about writing DLLMasher and how they hell does it even work at all. For the impatient you can find the source here: https://github.com/lmagder/DLLMasher

Posted in DLLMasher, Programming

Lucas Magder

I've been told not to call this a "blog" anymore

Trust people when they say “write games not engines”

Incase you were wondering, the moon landings were not actually faked

FaceWorks is on the Play Store

Exporting data symbols in C# for NVIDIA Optimus

Digital Ira

Well, that’s a terrible idea!

Pack the DLLs in a resource and extract them to a temp directory

Use delay-loaded DLLs

Somehow turn the compiled DLL into static library and link against that

Archives

Me on StackOverflow

Bookmarks

Meta