Nox Futura - March 9th 2018 Progress

Still working on moving office (it’ll be a while - we have a lot of stuff!), but still finding time to get some things done. I put off a couple of large pieces of feature work, and mostly focused on foundational items because I’ve not had a lot of uninterrupted time - but lots of small time blocks. (There’s one exception - a new ECS!). There’s a new pre-alpha build on itch with these changes in it.

The back-end code that handles shadows is a LOT more efficient now. It uses a geometry buffer to emit to all the cubemap layers at once, and I optimized the living daylights out of it. It’s quite a lot faster as a result. It also shows somewhat better shadows, even on low shadow-map detail levels.
Implemented a “world flags” SSBO (shader buffer object, an OpenGL 4.3 feature). When the terrain changes, it copies the maps “flags” vector into GPU memory (it’s an int32 bitset). It’s awesome to be able to map it and do a simple memcpy rather than all the stupid fiddling with vertex buffers and attributes. This is used by other shaders.
Reworked the terrain cube generation code slightly, fixing some “winding” issues; cubes are now ALWAYS wound correctly, so enabling GL_CULL_FACE will correctly remove only the inivisible cube faces. This gave a good performance boost, particularly on my low-end laptop.
Rewrote the sunlight code (big dark shadows from trees were annoying me, and while it looked good it wasn’t good to play). The sun/moon shader polls the new “world flags” bitset to determine if a tile is indoors or outdoors, and lights outdoor tiles with the sun/moon. I removed sun/moon shadows, they were too confusing (and slow). It looks pretty decent right now.
Added an exposure-based tonemapper, and some primitive code that determines exposure based on the average lightness of the screen. It needs work, but it already looks better on dark scenes.
Re-enabled the bloom code (smudging bright lights), with less stupid blur code. It’s very subtle (I hate the overdone bloom in some games - the blurriness gives me a headache), and I’m reasonably happy with it.
The engine now knows how to load compute shaders.
Ran into some horrible issues with game saving/loading. I ended up rewriting my ECS. See below.
Re-enabled the “wish” command sploosh. It sets the top of the sky to be full of water, which then stress-tests the fluids system by having it all fall and make a mess.
Savegames now use Cereal’s binary format. I thought XML would be helpful for debugging, but it really wasn’t. Starting savegames went from 1.5mb to 120k!
Optimized the living daylights out of the fluid dynamics code. Ended up not using a compute shader for now (I’ll almost certainly try it), after my “lets write this for the CPU to see how it should work” pass was successful for anything less than a truly extreme amount of water. Had a lot of fun playing with various scenarios in which people drowned themselves. Also identified a few problems with terrain generation giving me a river that wasn’t water-tight. Spent some fun time adjusting worldgen to produce rivers that don’t leak (I seriously love worldgen).

The New Entity-Component System (ECS)

I’ve had a pretty solid ECS going for 2.5 years now (closer to 3, my ECS actually predates the game). It’s worked fine in various other games, and has always been a good performer. It was based on EntityX, which always impressed me. I ran into problems with saving/loading games that have been going for a while (new games work every time), and after some very frustrating debugging realized that the ECS wasn’t consistent in the ID numbers it assigned to component types. Rather than force you to register every type upfront, it used a static to determine a family_id by component type on registration (and also looked these up when it saw a component). In a long-running game that introduces new components that aren’t seen during world-gen, it becomes increasingly unlikely that they will always get the same family_id. That leads to really confusing bugs, and things generally falling apart after save/load. Oops. I spent several hours trying to fix this, before concluding that some of my underlying designs in the ECS were incorrect for this type of game (they work fine for more predictable games). I also identified some inefficiencies I could fix while I was at it.

I’m a big believer in forcing myself to use interfaces, so fortunately every access to the ECS in-game goes through one interface. So I created a test project, implemented the same interface and rebuilt the entity/component store and query mechanisms from the ground up - writing tests as I went. Once it passed every test I could throw at it, I put it into NF - compiled and ran it. Other than having to clean up after deciding to use int rather than size_t in a couple of places, it ran really well. Faster than before, and load/save worked every time. :-) (This took a total of about 6 hours; I didn’t sleep much that night…)

So first up, I made the ECS require that you declare all the components with which it can interact up-front. NF uses a lot of components (98 last time I counted); this led to a truly horrific statement:

using my_ecs_t = bengine::ecs_t<position_t, designations_t, farming_designations_t, ai_tag_work_farm_plant, ai_tag_work_guarding,
        ai_mode_idle_t, ai_settler_new_arrival_t, ai_tag_leisure_shift_t, ai_tag_my_turn_t, ai_tag_sleep_shift_t, ai_tag_work_architect,
        ai_tag_work_building, ai_tag_work_butcher, ai_tag_work_farm_clear, ai_tag_work_farm_fertilize, ai_tag_work_farm_fixsoil,
        ai_tag_work_farm_water, ai_tag_work_farm_weed, ai_tag_work_harvest, ai_tag_work_hunting, ai_tag_work_lumberjack,
        ai_tag_work_miner, ai_tag_work_order, ai_tag_work_pull_lever, ai_tag_work_shift_t, architecture_designations_t,
        bridge_t, building_t, building_designations_t, construct_container_t, construct_power_t, construct_door_t, construct_provides_sleep_t,
        entry_trigger_t, receives_signal_t, smoke_emitter_t, turret_t, designated_farmer_t, designated_hunter_t,
        calendar_t, camera_options_t, claimed_t, corpse_harvestable, corpse_settler, designated_lumberjack_t, explosion_t,
        falling_t, game_stats_t, grazer_ai, health_t, initiative_t, lever_t, lightsource_t, logger_t, name_t, natural_attacks_t,
        renderable_t, renderable_composite_t, riding_t, sentient_ai, settler_ai_t, sleep_clock_t, slidemove_t, species_t,
        stockpile_t, viewshed_t, water_spawner_t, wildlife_group, world_position_t,
        item_ammo_t, item_bone_t, item_chopping_t, item_digging_t, item_drink_t, item_farming_t, item_fertilizer_t,
        item_food_t, item_hide_t, item_leather_t, item_melee_t, item_ranged_t, item_seed_t, item_skull_t, item_spice_t,
        item_topsoil_t, item_t, item_carried_t, item_creator_t, item_quality_t, item_stored_t, item_wear_t, designated_miner_t,
        mining_designations_t
    >;

I was afraid that this would kill compile times, but it had the opposite effect: all the component headers get parsed once, so the compiler is able to re-use them a lot. I also worried about size limits on parameter packs, but I haven’t hit them.

The biggest change is in how components are stored/identified. Previously, I didn’t enforce registration up-front, so the ECS did a little dance with a template component_family and a static counter to figure out an ID#. Since I didn’t know the component types at compile-time, but didn’t want to force components to adhere to an interface, a lot of things were dynamic. A component_base class contained is_deleted and the entity_id; a templated component_holder class then inherited from that base to decorate it. Finally, components were stored in a vector of unique_ptr to the template component_holder type (which in turn forced me to register a TON of base-types in Cereal). These were themselves stored in another vector, indexed by component_family index. It worked, was pretty fast, but relied upon a number of compiler optimizations to perform well (vtable elimination, and the unique_ptr being optimized away).

The new system determines family_id at compile-time, via an std::index_sequence (and the near-magical std::create_index_sequence_for function that makes a sequential ID # for each entry in a tuple). So all of the static incrementer is gone, and each component is guaranteed to get the same ID # each time. I eliminated the inheritance by making component_holder<C> a simple struct of entity_id, is_deleted and C component - so it simply “decorates” your component by adding an int ant a bool to the beginning. And since we know all of the types up front, allocating component storage becomes:

std::tuple<std::pair<size_t, std::vector<component_holder<Components>>>...> storage;

That lets me use std::get<std::pair<size_t, std::vector<component_holder<Component>>>> to retrieve the correct storage blck at compile time (and allocate the initial vectors at compile time!). The ECS constructor populates the size_t with the family_ids (helpful in some retrieval code). So the new system eliminates the unique_ptr indirection, guarantees that the vector objects (not their contents, which the vector will allocate on its own) are adjacent in memory for happy caches, and eliminates the need for a vtable/virtual lookup completely. When querying, a lot of the type lookups are now purely compile-time, which is more of a performance win than I expected.

This gives a number of advantages:

Determining the ID# (family_id) of a given Component is now a compile-time task, and I can return a handy error message via static_assert (rather than 100+ lines of gobbledegook) if a component isn’t registered. So every function that needs to determine the IDs of component types can do so with zero overhead.
This lets me store components in a big tuple of vector<component_holder<C>>. Accessing it via std::get<vector<component_holder_<C>>> is also compile time, so there’s zero overhead for finding the right component store. (It previously used a static counter and instantiated an empty component to find the ID, which apparently was slower than I thought!)
Since family_id is determined at compile time, the bitset of which components an entity has is also sized automatically. I can compile-time generate a list of bit numbers to test given a variadic parameter pack of component types, so a complex query is very, very fast.
It eliminated any allocation from query calls.
It really cleaned up my serialization code. No more polymorphic type registrations for Cereal!
Much to my surprise, it compiles really fast.

The net result is a big speed increase, one I really wasn’t expecting!

Nox Futura - March 9th 2018 Progress

The New Entity-Component System (ECS)

See Also