The focus of the week has been getting the low-level stuff in order. So there’s a lot of small changes/rearranging on the core level:
- The thread pool determines it’s size based on the number of cores in your system (minus 1, so as to keep one free for the main thread and the OS).
- Lots of small tweaks to get GCC as happy as Clang (Clang seems to be considerably more forgiving in terms of which STL headers you import). Compiles and runs on Linux/Windows again.
- Fixup for the Cereal import to provide a dedicated
FindCereal.cmake(detection is MUCH more reliable now).
- Implemented much better OpenGL error reporting.
- Updated CLion, and worked to remove the majority of the static analyzer complaints about my code.
- Spent quite a bit of time removing compilation warnings.
- Spent some time getting it running so that there are no OpenGL warnings presented by Mesa in debug mode (it’s really picky, but the code ended up MUCH better as a result).
The really time consuming part has been getting the render code going. It’s not done yet – I’m taking my time to get it right, since so much depends upon it, but there have been some major breakthroughs! The biggest challenge when rendering NF regions is the sheer amount of data to get onto the screen. The whole playable region is dynamic, so there’s little opportunity to pre-bake anything; terrain can change/move, new constructions can fill the sky, and water moves around – so it’s quite the optimization challenge!
A region is a series of cubic tiles (which are generally either open space, floors or full cubes – which are then subdivided by various flags such as “constructed” (vs. natural), “window” (which needs to be transparent), etc. The size of a region is defined by some constants, and can be adjusted – but the majority of the time I go with 256x256x128.
A naieve render would potentially have to produce 8,388,608 cubes in the worst case (150,994,944 triangles). That’s a lot of geometry to be throwing around, especially when it can change per-frame. Simply producing 150 million triangles and feeding them into my video card takes longer than is normally permitted for a frame. So:
I first sub-divide the region into “chunks”. The size of chunk is
constexpr definable, so I can play with various sizes. Currenly, I’m using 32x32x32. Chunks can be marked as “dirty”, and regenerate their geometry asynchronously (I don’t really mind if an update misses a frame or two, so long as the newly adjusted geometry shows up reasonably quickly, I can hand-wave small delays with effects). Chunks upload their geometry to the video card once (or when updated), so they are ready to render, Whenever the camera changes, a set of
visible chunks is generated (using frustrum culling), and only those are presented to the renderer. Chunks also store geometry per z-layer (so an array of 32 sets of geometry); you often only need a few z-layers within a chunk (and definitely don’t want to render above the camera, so the player can still see into the layer being edited). This reduces the per-chunk naieve draw-load to 32,768 cubes – or 589,824 triangles. Given that you generally only need 1-4 chunks for a display, that’s a big improvement (and was where the previous version stopped on optimizations).
A second stage applies a “greedy voxels” algorithm to each chunk-layer. This page has a great illustration of the process. Basically, it starts by making a set of floors and a set of cubes for each chunk-layer. It then iterates each one (starting at the top-left), and compares neighboring tiles for similarity. If a horizontal neighbor is the same, it “eats” it – expanding that piece of geometry to the right, and deleting the other. If it can’t go right, it tries to go down – expanding if ALL of the next y-level’s tiles are identical. This is a bit on the sluggish side (it takes about 5 seconds to greedily mesh the whole level using 7 threads, but individual chunk refreshes are fast enough). One convenient trick is that un-revealed tiles are always identical (why do I have to render them at all? Mining! You may want to select tiles that you haven’t seen yet as mining targets – so I need to be able to render layers of un-revealed geometry). The best case for this algorithm is the worst-case for a naieve approach: an entirely solid region, made up entirely of unknown geometry. This can reduce the number of cubes from 32,768 to 1 (589,824 triangles down to 36 triangles). In most cases, even with a high variety of cube types, it’s a massive improvement in overall geometry production.
The third stage renders the visible regions only to the depth buffer, backwards (top-most first). This maximizes the likelihood that the z-buffer can cull geometry, and lets me reduce the later – expensive – render calls massively by culling everything that isn’t equal to an existing z-buffer entry.
The next stage is classic deferred rendering; geometry is splatted to a “g buffer”, providing per-pixel normal information, texture color (using a repeat mode to repeat cube geometry), and world position (for mouse picking; this is interpolated by the GPU, so it handles stretched cubes). Colors are stripped of their gamma correction on the way through, so I can be sure that I’m working entirely in linear color space.
The next stage is still in progress (hence no screenshots this week), but it apples lighting information to two more “g buffer” channels (light position and color). Since lights are dynamic (you can build things that emit light), this is quite variable in scope. There’s a lot of work going into making sure that lights that don’t change are little more than a copy operation.
After that, we render a final display by taking the various g-buffer elements and combining them using lighting algorithms. This is also in development, I hope to have some really nice screenshots for next week.
Anyway, the upshot of all of this is that the game is now performing really well, at least on the render phases. If I let 7 threads build the world, it gives about 40 fps while the chunks render (and you get to watch them pop in; this will be hidden in the final release but is REALLY neat for debugging) and about 1,100 fps once the geometry is generated. That’s ludicrously fast, which is where it needs to be to give room for the simulation!