Still working on moving office (it’ll be a while - we have a lot of stuff!), but still finding time to get some things done. I put off a couple of large pieces of feature work, and mostly focused on foundational items because I’ve not had a lot of uninterrupted time - but lots of small time blocks. (There’s one exception - a new ECS!). There’s a new pre-alpha build on itch with these changes in it.
- The back-end code that handles shadows is a LOT more efficient now. It uses a geometry buffer to emit to all the cubemap layers at once, and I optimized the living daylights out of it. It’s quite a lot faster as a result. It also shows somewhat better shadows, even on low shadow-map detail levels.
- Implemented a “world flags” SSBO (shader buffer object, an OpenGL 4.3 feature). When the terrain changes, it copies the maps “flags” vector into GPU memory (it’s an int32 bitset). It’s awesome to be able to map it and do a simple
memcpyrather than all the stupid fiddling with vertex buffers and attributes. This is used by other shaders.
- Reworked the terrain cube generation code slightly, fixing some “winding” issues; cubes are now ALWAYS wound correctly, so enabling
GL_CULL_FACEwill correctly remove only the inivisible cube faces. This gave a good performance boost, particularly on my low-end laptop.
- Rewrote the sunlight code (big dark shadows from trees were annoying me, and while it looked good it wasn’t good to play). The sun/moon shader polls the new “world flags” bitset to determine if a tile is indoors or outdoors, and lights outdoor tiles with the sun/moon. I removed sun/moon shadows, they were too confusing (and slow). It looks pretty decent right now.
- Added an exposure-based tonemapper, and some primitive code that determines exposure based on the average lightness of the screen. It needs work, but it already looks better on dark scenes.
- Re-enabled the bloom code (smudging bright lights), with less stupid blur code. It’s very subtle (I hate the overdone bloom in some games - the blurriness gives me a headache), and I’m reasonably happy with it.
- The engine now knows how to load compute shaders.
- Ran into some horrible issues with game saving/loading. I ended up rewriting my ECS. See below.
- Re-enabled the “wish” command
sploosh. It sets the top of the sky to be full of water, which then stress-tests the fluids system by having it all fall and make a mess.
- Savegames now use Cereal’s binary format. I thought XML would be helpful for debugging, but it really wasn’t. Starting savegames went from 1.5mb to 120k!
- Optimized the living daylights out of the fluid dynamics code. Ended up not using a compute shader for now (I’ll almost certainly try it), after my “lets write this for the CPU to see how it should work” pass was successful for anything less than a truly extreme amount of water. Had a lot of fun playing with various scenarios in which people drowned themselves. Also identified a few problems with terrain generation giving me a river that wasn’t water-tight. Spent some fun time adjusting worldgen to produce rivers that don’t leak (I seriously love worldgen).
The New Entity-Component System (ECS)
I’ve had a pretty solid ECS going for 2.5 years now (closer to 3, my ECS actually predates the game). It’s worked fine in various other games, and has always been a good performer. It was based on EntityX, which always impressed me. I ran into problems with saving/loading games that have been going for a while (new games work every time), and after some very frustrating debugging realized that the ECS wasn’t consistent in the ID numbers it assigned to component types. Rather than force you to register every type upfront, it used a static to determine a
family_id by component type on registration (and also looked these up when it saw a component). In a long-running game that introduces new components that aren’t seen during world-gen, it becomes increasingly unlikely that they will always get the same
family_id. That leads to really confusing bugs, and things generally falling apart after save/load. Oops. I spent several hours trying to fix this, before concluding that some of my underlying designs in the ECS were incorrect for this type of game (they work fine for more predictable games). I also identified some inefficiencies I could fix while I was at it.
I’m a big believer in forcing myself to use interfaces, so fortunately every access to the ECS in-game goes through one interface. So I created a test project, implemented the same interface and rebuilt the entity/component store and query mechanisms from the ground up - writing tests as I went. Once it passed every test I could throw at it, I put it into NF - compiled and ran it. Other than having to clean up after deciding to use
int rather than
size_t in a couple of places, it ran really well. Faster than before, and load/save worked every time. :-) (This took a total of about 6 hours; I didn’t sleep much that night…)
So first up, I made the ECS require that you declare all the components with which it can interact up-front. NF uses a lot of components (98 last time I counted); this led to a truly horrific statement:
using my_ecs_t = bengine::ecs_t<position_t, designations_t, farming_designations_t, ai_tag_work_farm_plant, ai_tag_work_guarding, ai_mode_idle_t, ai_settler_new_arrival_t, ai_tag_leisure_shift_t, ai_tag_my_turn_t, ai_tag_sleep_shift_t, ai_tag_work_architect, ai_tag_work_building, ai_tag_work_butcher, ai_tag_work_farm_clear, ai_tag_work_farm_fertilize, ai_tag_work_farm_fixsoil, ai_tag_work_farm_water, ai_tag_work_farm_weed, ai_tag_work_harvest, ai_tag_work_hunting, ai_tag_work_lumberjack, ai_tag_work_miner, ai_tag_work_order, ai_tag_work_pull_lever, ai_tag_work_shift_t, architecture_designations_t, bridge_t, building_t, building_designations_t, construct_container_t, construct_power_t, construct_door_t, construct_provides_sleep_t, entry_trigger_t, receives_signal_t, smoke_emitter_t, turret_t, designated_farmer_t, designated_hunter_t, calendar_t, camera_options_t, claimed_t, corpse_harvestable, corpse_settler, designated_lumberjack_t, explosion_t, falling_t, game_stats_t, grazer_ai, health_t, initiative_t, lever_t, lightsource_t, logger_t, name_t, natural_attacks_t, renderable_t, renderable_composite_t, riding_t, sentient_ai, settler_ai_t, sleep_clock_t, slidemove_t, species_t, stockpile_t, viewshed_t, water_spawner_t, wildlife_group, world_position_t, item_ammo_t, item_bone_t, item_chopping_t, item_digging_t, item_drink_t, item_farming_t, item_fertilizer_t, item_food_t, item_hide_t, item_leather_t, item_melee_t, item_ranged_t, item_seed_t, item_skull_t, item_spice_t, item_topsoil_t, item_t, item_carried_t, item_creator_t, item_quality_t, item_stored_t, item_wear_t, designated_miner_t, mining_designations_t >;
I was afraid that this would kill compile times, but it had the opposite effect: all the component headers get parsed once, so the compiler is able to re-use them a lot. I also worried about size limits on parameter packs, but I haven’t hit them.
The biggest change is in how components are stored/identified. Previously, I didn’t enforce registration up-front, so the ECS did a little dance with a template
component_family and a static counter to figure out an ID#. Since I didn’t know the component types at compile-time, but didn’t want to force components to adhere to an interface, a lot of things were dynamic. A
component_base class contained
is_deleted and the
entity_id; a templated
component_holder class then inherited from that base to decorate it. Finally, components were stored in a
unique_ptr to the template
component_holder type (which in turn forced me to register a TON of base-types in Cereal). These were themselves stored in another
vector, indexed by
component_family index. It worked, was pretty fast, but relied upon a number of compiler optimizations to perform well (vtable elimination, and the unique_ptr being optimized away).
The new system determines
family_id at compile-time, via an
std::index_sequence (and the near-magical
std::create_index_sequence_for function that makes a sequential ID # for each entry in a tuple). So all of the static incrementer is gone, and each component is guaranteed to get the same ID # each time. I eliminated the inheritance by making
component_holder<C> a simple struct of
C component - so it simply “decorates” your component by adding an int ant a bool to the beginning. And since we know all of the types up front, allocating component storage becomes:
std::tuple<std::pair<size_t, std::vector<component_holder<Components>>>...> storage;
That lets me use
std::get<std::pair<size_t, std::vector<component_holder<Component>>>> to retrieve the correct storage blck at compile time (and allocate the initial vectors at compile time!). The ECS constructor populates the
size_t with the family_ids (helpful in some retrieval code). So the new system eliminates the
unique_ptr indirection, guarantees that the vector objects (not their contents, which the
vector will allocate on its own) are adjacent in memory for happy caches, and eliminates the need for a vtable/virtual lookup completely. When querying, a lot of the type lookups are now purely compile-time, which is more of a performance win than I expected.
This gives a number of advantages:
- Determining the ID# (
family_id) of a given
Componentis now a compile-time task, and I can return a handy error message via
static_assert(rather than 100+ lines of gobbledegook) if a component isn’t registered. So every function that needs to determine the IDs of component types can do so with zero overhead.
- This lets me store components in a big
vector<component_holder<C>>. Accessing it via
std::get<vector<component_holder_<C>>>is also compile time, so there’s zero overhead for finding the right component store. (It previously used a
staticcounter and instantiated an empty component to find the ID, which apparently was slower than I thought!)
family_idis determined at compile time, the
bitsetof which components an entity has is also sized automatically. I can compile-time generate a list of bit numbers to test given a variadic parameter pack of component types, so a complex query is very, very fast.
- It eliminated any allocation from query calls.
- It really cleaned up my serialization code. No more polymorphic type registrations for Cereal!
- Much to my surprise, it compiles really fast.
The net result is a big speed increase, one I really wasn’t expecting!