I'm pretty excited about this whole DOD-ECS concept, since the architecture offers so many benefits, for example, it solves several outstanding problems relating to concurrent programming in games such as thread safety guarantees, and does so without any formal mutexes (lock-free),
but more interestingly I think that this architecture can easily be adapted to non-gaming / business applications, and even to general applications.
That is to say, the wider software development community could benefit from advances in this direction, as much as game developers, or more.
The amount of sourcecode contained in this project gives no indication to the amount of effort and number of iterations required to produce it.
This implementation of DOD-ECS is the product of a long series of failures and wrong turns, many resulting from my own assumptions, my programming habits, and just the way I've been conditioned to think about code in terms of objects instead of functions and data.
Having said all that, the resulting source is not too bad, considering it has to work on MSVC 2010, which has no support for things like variadic templates, things that would have helped me, had I chosen a more modern compiler, and the underlying design is sound in theory.
There were some points that came up late in the development where I realized that I could save on some internal lookups by deliberately allowing some data fields to belong to more than one component - to have some redundant copies of data can actually be good for performance.
I didn't have time to experiment with all of those ideas, so more performance can still be had, I'm sure of that.
There were entire systems omitted from this project that would be must-haves in any game engine, however they were outside of the scope I'd set for myself
within my original proposal: I wasn't setting out to create a game engine, the whole idea was to create a game logic system that can still play nicely within existing engines, while leveraging them for things like resource handling, but not relying on them for frame processing.
In retrospect, my implementation smells suspiciously like the guts of Gamebryo - that engine is highly data-driven, and the data is chock-full of various IDs, I suspect that Gamebryo's internal design shares features with my implementation of DOD-ECS.
I'm glad I chose this project as I've learned a lot about hardware-friendly and data-centric approaches to programming, and furthermore, I can actually see myself using this stuff as an alternative to licensing a full engine. I found that making a DOD-ECS application was a lot like making a sandwich: you have to think about what fillings you want on it, what order you'd prefer them in, and whether any of your ingredients bring their own requirements (maybe some pickled tomato relish). The ability to slap together a very specific game engine with just the needed functionality, and have everything in a game data-driven including the engine itself, is just too enticing to not follow up.
Tuesday, 1 December 2015
Wednesday, 30 September 2015
MRT fails to write to one or more output textures - cause and solution
I had a strange issue when I was trying to write to multiple textures in a single shader pass (MRT), that having activated several targets (glDrawBuffers), only the first one was getting any output, the others were not being changed, other than being cleared correctly to my desired clearcolor.
The problem turned out to be a combination of two things.
Firstly, I had Blending enabled, which meant that my current blendfunc would be applied to output pixels, and secondly, my fragshader was outputting RGB vec3 values, leaving alpha unaltered.
If you're playing around with MRT, and one or more outputs refuse to get written to, doublecheck that you have blending disabled, and/or check your pixel alpha.
Tuesday, 29 September 2015
Sometimes, in order to move forwards, you need to take a few steps backwards.
Here's a few screenshots showing some work from the current refactor of the FireStorm Renderer.
The first two images just show per-pixel (phong) lighting being applied to a 3D textured cube, and rendered courtesy of a regular forwards-rendering setup, with support for a single pass per light.
Here's another view
The third image shows work on the Deferred Lighting pipeline, and is far more interesting to me.
Tex 1 is an input texture being applied to the geometry. The ''geompass' shader writes to three output color textures (2,3 and 4) and also writes to one depth texture (5 - it's quite faint, but it's there)
Tex 2 holds the WorldSpace Position of the 3D surfaces as determined at each pixel.
If we had a software raycaster that was casting 3D rays from each pixel on the near plane of the camera's view frustum until they hit some surface in the world, the position of each intersection point of a cast ray with the world surfaces is what we're recording per pixel, and doing so with no raycasting. The operation is laughably cheap in the geompass fragment shader.
Tex 3 is our regular diffuse texture output, that we'd get from forward-rendering with no lighting.
It carries the texture detail for all the objects in our scene for the current frame.
Tex 4 is similar to Tex2 - but it holds per-pixel WorldSpace Surface Normals, instead of Surface Position.
Tex 5 is a Depth Texture - it's capturing depth info just like the regular depthmap, but in the form of a texture, so it can be read back in another shader pass, where the data from these textures is read back in and the final lighting calculation is performed per pixel.
Tex 6 is waiting to capture the final results of the per-pixel lighting (baking) pass.
The first two images just show per-pixel (phong) lighting being applied to a 3D textured cube, and rendered courtesy of a regular forwards-rendering setup, with support for a single pass per light.
Here's another view
The third image shows work on the Deferred Lighting pipeline, and is far more interesting to me.
Tex 1 is an input texture being applied to the geometry. The ''geompass' shader writes to three output color textures (2,3 and 4) and also writes to one depth texture (5 - it's quite faint, but it's there)
Tex 2 holds the WorldSpace Position of the 3D surfaces as determined at each pixel.
If we had a software raycaster that was casting 3D rays from each pixel on the near plane of the camera's view frustum until they hit some surface in the world, the position of each intersection point of a cast ray with the world surfaces is what we're recording per pixel, and doing so with no raycasting. The operation is laughably cheap in the geompass fragment shader.
Tex 3 is our regular diffuse texture output, that we'd get from forward-rendering with no lighting.
It carries the texture detail for all the objects in our scene for the current frame.
Tex 4 is similar to Tex2 - but it holds per-pixel WorldSpace Surface Normals, instead of Surface Position.
Tex 5 is a Depth Texture - it's capturing depth info just like the regular depthmap, but in the form of a texture, so it can be read back in another shader pass, where the data from these textures is read back in and the final lighting calculation is performed per pixel.
Tex 6 is waiting to capture the final results of the per-pixel lighting (baking) pass.
Friday, 18 September 2015
Single-Pass OmniDirectional Shadows in a Deferred Shading Environment
Until recently, I believed that the only way to achieve omidirectional shadows was to use six passes to generate the faces of a cubemap, or to use two passes (one with MRT), to generate the faces of a dual paraboloid.
It turns out that we can in fact, attach a cubemap to a framebuffer such that we may draw to all six faces in a single pass, without using MRT, and the trick is to use a Geometry Shader (inbetween the VS and FS shader stages) whose job is to transform the shadowcaster geometry we're drawing , via one of six virtual camera views, into one of six output FS executions(!)
Our utilization of a Geometry Shader for 'multiple projections' of the rendered geometry leads to the setting of a built-in GLSL variable called gl_Layer, which is a zero based integer whose value can be read from the FS, indicating which CubeMap Face (or plane) is the target of the FS execution.
Unfortunately, unlike the Lighting stage, we can't use GPU Instancing to help with batching our lights - we can use it for drawing geometry to the shadow cubemap, but not to send the lights themselves.
So we have a choice then, to either perform a shadowing pass per light, or to write several shadowmap shader variants that each cope with a specific number of lights, just like we'd be doing if this was a Forwards rendering environment.
Due to the cost of shadowcasting, and in order to reduce the amount of work involved therein, I've introduced a boolean switch in the Material class, which denotes whether objects made of a given material can cast shadows. This will allow the user to make premium decisions about which elements of a scene may cast shadows, and reduce the average number of potential shadowcasters per frame.
If required, I can put an override in the RenderableComponent which will allow the user to allow or disallow shadowcasting on a per-renderable level and without regard to material settings.
Monday, 14 September 2015
Combining Light and Shadow - Can lighting passes be used to create shadow maps?
Shadow-Casting FragmentShaders are always written relative to a single LightSource.
They replace the usual 'camera view' - the virtual camera origin they choose is the position of the light source, and the direction they choose is the direction from the light to the fragment (or its reverse). That's the usual case, anyway.
Deferred Shaders can support multiple lights cheaply, because most of the lighting equation's required components are output in a preliminary pass, and later, a final pass is used to bake the various components of the lighting equation using a relatively cheap post-process pass, which takes in the lighting equation components and calculates the per-pixel-lighting final values (ignoring anything else we might do to improve the quality of our final image).
Another reason that Deferred Shaders are fast is that most things happen in Camera View Space.
Do you think that we might be able to write a shader that can operate in ViewSpace, which can transform depth information (written during the preliminary pass) into per-output-texture-space for say, 6 output targets (shadow cubemap faces), and also using gpu-instancing, do this for one shape of light emitter, in one shader pass?
http://cg.siomax.ru/index.php/computer-graphics/10-one-pass-rendering-to-cube-map
Apparently, with the help of a Geometry Shader, we can indeed render our shadow cubemap in a single pass, at least for a single Light. This requires the use of a built-in GLSL primitive called gl_Layer, which although technically this was introduced in OpenGL 4.1, it's available on a wide range of OpenGL 3.3 hardware.
I am immensely curious as to how wide a range that is.
Friday, 11 September 2015
Light Components: High numbers of dynamic lights, GPU-Instancing of Multiple Light Volumes in a Deferred Shader
Lighting has been implemented in FireStorm as a Component - any Entity that supports Frustum Culling (i.e. has a Cullable Component) may also have a Light component, and therefore, any cullable entity can potentially be a light-caster.
The Light component adopts the Transform of its owner Entity, which means that light will appear to be cast from the origin of said Entity - if you wish to offset the transform of a light (for example, to position a headlight on the front of a vehicle), you'll need to create a separate (and likely more simple) entity for such a light, and use Transform Parenting to attach the light entity to its parent.
Both pointlights and spotlights are supported, while directional lighting is handled separately (FireStorm only supports one Directional light, but any number of Spot or Point lights).
Not everything in a game engine should necessarily be a Component, however there was a good reason behind the decision to implement Lights as components of entities - assuming there are a high number of possibly dynamic light sources in the game world, we'd like to cull the lights that have no effect on the rendered scene, which is easy if we treat them as cullable entities, given that a mechanism for performing visibility queries had already been implemented: FireStorm uses a Dynamic AABB Tree to track the spatial relationship of all entities in the game world, and uses this structure to accelerate camera visibility queries - while collecting the list of Renderable entities which intersect the camera's view frustum (our list of stuff to draw), we can also collect a separate list of Light-Caster entities which will be applied via deferred-shading, and a subset of which may also be applied during shadow-casting passes.
FireStorm's new render pipe leverages several features of modern (GL3+) graphics cards - Deferred Shading is implemented via Multiple RenderTargets (MRT), Deferred Lighting is implemented via GPU-Instancing in tandem with Stencil-Rejection, and work has begun on some new Shadow-Casting technology.
Friday, 4 September 2015
Some things should not be Components - but they can still benefit from being Data-Oriented!
So today was an interesting day.
I finally had a use-case for keeping Materials by (16-bit) ID, rather than by Pointer.
Although one does not necessarily exclude the other, I was not able to boil down a 'pure pointer' (as allocated by the 'new' operator) into a flat index into my global Materials container, since that container was based on STL map. Basically, my Material class was one of a number of 'self-managing' classes, that provided its own static map, and managed that map internally.
Component Systems are all being based on a (templated) class called ObjectPool - it's how Systems internally manage their data elements (or class objects), the ObjectPool class provides its own 'placement new' based allocation scheme, as well as code to convert between flat indices and pointers.
Changing my global Materials container from STL's map template to my own ObjectPool-based container was not problematic, but it did require that I code up support for STL-LIKE iterators for my ObjectPool implementations. That was actually kind of fun, and productive, my Materials are now sure to occupy contiguous flat memory addresses, and can be referenced at runtime by their ID, which can be stored using less bits than a Pointer (4 times less on a 64 bit system), but is effectively just as fast to access.
More interesting and potentially valuable, the template that powers ECS Systems now supports forwards-iteration of the active Components in any System.
I finally had a use-case for keeping Materials by (16-bit) ID, rather than by Pointer.
Although one does not necessarily exclude the other, I was not able to boil down a 'pure pointer' (as allocated by the 'new' operator) into a flat index into my global Materials container, since that container was based on STL map. Basically, my Material class was one of a number of 'self-managing' classes, that provided its own static map, and managed that map internally.
Component Systems are all being based on a (templated) class called ObjectPool - it's how Systems internally manage their data elements (or class objects), the ObjectPool class provides its own 'placement new' based allocation scheme, as well as code to convert between flat indices and pointers.
Changing my global Materials container from STL's map template to my own ObjectPool-based container was not problematic, but it did require that I code up support for STL-LIKE iterators for my ObjectPool implementations. That was actually kind of fun, and productive, my Materials are now sure to occupy contiguous flat memory addresses, and can be referenced at runtime by their ID, which can be stored using less bits than a Pointer (4 times less on a 64 bit system), but is effectively just as fast to access.
More interesting and potentially valuable, the template that powers ECS Systems now supports forwards-iteration of the active Components in any System.
Monday, 10 August 2015
JSON - Yet Another Markup Language?
I have to admit, I am not sold on JSON - it is no more or less useful than XML in terms of configuring applications or loading assets.
I've implemented support for JSON-based config files, but I wonder if it was worth the effort.
Friday, 17 July 2015
Multi-Pass Shaders
There's no such thing, really, as a multi-pass shader.
To perform multi-pass rendering, we select a shader and optionally a render target surface (FBO), and we draw some subset of our renderables. Then we select another shader, and repeat the process.
Multi-Pass Rendering involves multiple shaders, and drawing stuff more than once.
Not all renderables will be drawn during every 'pass' - for example, a pass might only draw Opaque objects, or it might only draw light volumes.
Materials were used in the days of single-pass rendering to encapsulate all the various requirements for drawing surfaces, mainly texture and light equation variables, which would then be presented, typically, to Uniforms of a very specific 'game geometry shader'.
In order to support complex 'multi-pass' rendering, it appears to make sense that we should further extend the Material class to hold data for a number of shaders, and variables that target a number of passes, which all sounds rather elaborate. It was decided that Materials are in fact not the appropriate place to declare mappings for multiple shader passes.
FireStorm's Renderer implements a 'RenderGraph' made from 'RenderNodes', where each node (except the root node) represents a render target surface, and the root node represents final screen output. The rendergraph is described elsewhere in this blog, but basically this tree of rendertargets can define complex scene compositions - the composition being exactly how the final screen image is built up from zero or more rendertargets, which can be arranged as a dependency tree.
RenderNodes don't interact with a Material class - they instead deal with RenderState and ShaderPass classes, the latter being a container for visible Renderables, which contain a bitkey used to associate a renderable with a subset of shaderpasses held by that renderable's specific target RenderNode.
In this way, each Renderable can declare which RenderNode it will be rendered to, and which ShaderPasses (within that Node) it will be rendered by.
Each shader can register its interest in various Material Attributes, and should it be required, a different material can be associated with each Renderable, for each ShaderPass.
When a RenderNode is processed, it executes a series of ShaderPasses, each with its own RenderState, its own Shader, and its own subset of (references to) Renderables, the results being drawn to that Node's rendertarget (either an FBO, or the screen).
Thursday, 16 July 2015
FireStorm ECS -> RenderGraph
The ultimate aims of the FireStorm ECS are those of any game system: apply behavioral logic, determine what is visible (and implicitly, what isn't), and draw visible stuff to some target surface, using some shader, some camera view and projection, etc.
When we reach the RenderableSystem, we have reached the end of the ECS mechanism: renderables which are visible to some camera will issue a draw request to some draw target.
FireStorm implements a RenderGraph which represents the high-level scene composition tree in terms of RenderTarget Surfaces (FBOs in GL speak) and defines how they are connected.
The RenderGraph can be visualized as a 'reverse tree', where the 'root node' represents the final output to the Screen, while every other 'RenderNode' is a FrameBuffer Object. This arrangement is close to what happens when we 'compose' our scene using multiple rendering stages via arbitrary control logic, and models what we actually want to express as graphics programmers (in my humble opinion) - how each target texture is to be rendered, and how they chain together to form the final image. This is non trivial due to Multiple Render Target technology, and in my opinion is best and most cleanly described in graph terms. Others may hold different opinions, please, chime in!
Please note again, the RenderGraph is *not* part of the ECS, it's not a Component or a System, not everything needs to fit inside the ECS.
The ECS RenderingSystem emits 'virtual draw requests' to specific nodes of the RenderGraph, they are sorted, batched and collected as Messages, awaiting final draw processing, which is realized through 'recursion return'. As usual, it can be multithreaded fairly easily, up to this point.
When it's time to draw the frame, the RenderGraph itself is recursed by the MAIN PROCESS THREAD, which owns the ONLY RENDER CONTEXT.
We start at the output node (the tree root), recurse the tree as we normally might, but only process the Messages captured by each RenderNode as we are RETURNING from recursion.
This ensures that Child rendertarget nodes are drawn ahead of Parent nodes who depend on them (this is a reverse tree remember? the root node is the final output...)
By processing each node only *after* recursing its children, we end up rendering to our rendertargets in such an order that any target surface is always ready for use as an input texture to a higher-order node (where 'higher order' means 'closer to the root node, which is final output')
RenderNodes are containers for render surfaces (output textures, MRT supported), but they are also containers for ShaderPasses. In order to draw a RenderNode, we execute its ShaderPasses. Renderables subscribe to specific Passes via a binary key, so we can draw some things in some passes, and not in others, to a specific surface, and as we've learned, with a specific camera.
DOD-ECS RenderableComponent : this is where the ECS ends, and the Renderer begins.
Today I fleshed out the Lua interface for the RenderableComponent.
Without explaining what FireStorm's Script Bindings are about, here it is:
LUA_CLASS_METHODS(RenderableComponent) =
{
LUA_CLASS_METHOD( getRenderTarget, RenderableComponent::get_targetSurface), // target RenderNode (container for FBO)
LUA_CLASS_METHOD( setRenderTarget, RenderableComponent::set_targetSurface),
LUA_CLASS_METHOD( getCameraID, RenderableComponent::get_CameraID), // which camera entity to use for culling and POV-rendering
LUA_CLASS_METHOD( setCameraID, RenderableComponent::set_CameraID),
LUA_CLASS_METHOD( getGeometry, RenderableComponent::get_geometry), // reference to GLVertexArray
LUA_CLASS_METHOD( setGeometry, RenderableComponent::set_geometry),
LUA_CLASS_METHOD( getPrimitiveType, RenderableComponent::get_primitiveType), // instructions for drawing
LUA_CLASS_METHOD( setPrimitiveType, RenderableComponent::set_primitiveType),
LUA_CLASS_METHOD( getFirstIndex, RenderableComponent::get_firstIndex),
LUA_CLASS_METHOD( setFirstIndex, RenderableComponent::set_firstIndex),
LUA_CLASS_METHOD( getNumIndices, RenderableComponent::get_numIndices),
LUA_CLASS_METHOD( setNumIndices, RenderableComponent::set_numIndices),
{0,0}
};
We can learn a lot about FireStorm's internals by examining the above interface description.
For example, it supports multiple active render cameras within a single frame.
Each renderable can be associated with exactly one camera, and exactly one render target surface.
We can also see that rendering is broken down into virtual draw requests which can be sorted and batched elsewhere. Indexed drawing, and drawing of indexed subsets is supported.
The PrimitiveType can be set per renderable, and we know that FireStorm's ECS supports more than one renderable per entity, should we require it (support for submeshes is implied).
I'm still not sold on the idea of exposing Component classes to the Script Layer (Lua in my case).
It seems to make more sense design-wise to expose System classes, and force any 'setters' to use the Message mechanism rather than doing direct writes. Have to think about it some more.
If I decide that I should be dealing with Systems, and not Components, on the Script side, then so be it, some code moves around, nothing's lost.
Tuesday, 14 July 2015
DOD-ECS: Decoupling Systems is good for Concurrency, and good for Cache
FireStorm ECS implements a kind of distributed messaging service, fed by a central Dispatch mechanism: each System provides a Message Queue to receive Messages sent from other Systems, and addressed to one or more of its Components.
Just before each ECS System is Updated, any Messages sent to that System are Delivered, in terms of actually being Processed.
Messages are the primary way for the ECS Systems to send data to other Systems.
Since each System is updated separately, and has its own Message Queue,
the Message paradigm effectively decouples Systems from one another, which is good for Concurrency - the implications are that it's safe for multiple Threads to update different Components owned by the same System, and it's always safe to Read from Components of Another System, with no need for any kind of locking, other than a 'barrier' to ensure only one System is updated at any moment in time.
System Updates are the primary cause of messages getting sent to other Systems.
Since each System processes its Messages before updating its Components,
we usually want to send Messages to Systems that are of lower priority (are yet to be Updated).
The reason is that Messages sent to higher-priority Systems won't be processed until the next Frame... so the trick is to try to arrange your system updates rationally (which I impose through ordered system ids).
So - why have a Message QUEUE at all? Why not just process ECS events as soon as they occur?
Messages are used to convey data from a component in one System, to a component in another System, usually for Writing locally within that remote component.
They are a means of implementing 'deferred-write' thread safety!
Is concurrency the only benefit to be gained from buffering inter-System messages?
We also benefit in at least one other important way: we get a much better data access pattern,
leading to much improved cache performance, which is perhaps the number one performance killer in modern games.
Think about it: if a System processed ECS events as we generated them, we would be immediately touching data
owned by other Systems in a non-linear fashion, which implies lots of cache misses.
We would later go on to process that System, by which time any cache warming we might have gained has likely been lost.
Rather, if each System first deals with messages sent to its components, then updates its components,
we get the full benefit of cache-warming just before our full linear component update pass.
If we're careful, the overhead we pay for this should be a lot less than the performance gained through
the application of cache-friendly data access patterns.
Just before each ECS System is Updated, any Messages sent to that System are Delivered, in terms of actually being Processed.
Messages are the primary way for the ECS Systems to send data to other Systems.
Since each System is updated separately, and has its own Message Queue,
the Message paradigm effectively decouples Systems from one another, which is good for Concurrency - the implications are that it's safe for multiple Threads to update different Components owned by the same System, and it's always safe to Read from Components of Another System, with no need for any kind of locking, other than a 'barrier' to ensure only one System is updated at any moment in time.
System Updates are the primary cause of messages getting sent to other Systems.
Since each System processes its Messages before updating its Components,
we usually want to send Messages to Systems that are of lower priority (are yet to be Updated).
The reason is that Messages sent to higher-priority Systems won't be processed until the next Frame... so the trick is to try to arrange your system updates rationally (which I impose through ordered system ids).
So - why have a Message QUEUE at all? Why not just process ECS events as soon as they occur?
Messages are used to convey data from a component in one System, to a component in another System, usually for Writing locally within that remote component.
They are a means of implementing 'deferred-write' thread safety!
Is concurrency the only benefit to be gained from buffering inter-System messages?
We also benefit in at least one other important way: we get a much better data access pattern,
leading to much improved cache performance, which is perhaps the number one performance killer in modern games.
Think about it: if a System processed ECS events as we generated them, we would be immediately touching data
owned by other Systems in a non-linear fashion, which implies lots of cache misses.
We would later go on to process that System, by which time any cache warming we might have gained has likely been lost.
Rather, if each System first deals with messages sent to its components, then updates its components,
we get the full benefit of cache-warming just before our full linear component update pass.
If we're careful, the overhead we pay for this should be a lot less than the performance gained through
the application of cache-friendly data access patterns.
Saturday, 11 July 2015
Core Values: Treating strings as reference-counted, shareable objects
FireStorm provides a CORE::Value container class, originally introduced to facilitate interoperability with the Lua script language,.
Value class is basically a variant-type container that supports all the usual suspects (boolean, number, string, object, table, etc), implemented as a unioned structure. Until today, this structure was bloated by a 256-byte buffer whose purpose is to hold data for a short string, while all other possible types require far less memory than that.
This never presented as a problem until I recently implemented an ECS AttributeComponent, which essentially wraps a pair of Values (key, value pair). When the components were aligned in a Pool, it became painfully obvious how much memory was being wasted, so the decision was made to move strings out of the Value container, replacing them with a unique integer identifier whose value is based on a string hashing algorithm.
Values of 'string type' now hold 32-bit hashes, which reference unique strings.
The global container for unique strings is a map whose keys are hashes, and whose values are pairs containing the original string, and a reference count, which is used to unload strings when the last remaining reference is removed. Dynamic strings are not a problem, and modifying a string Value does not affect external references to the old string Value.
The changes to the Value container class seemed trivial, however I did run into a few problems relating to destruction of local value copies. Now it's done, the size of an AttributeComponent has been reduced from 512 bytes to 16 bytes per Attribute, making Values and Attributes a lot more attractive in terms of memory footprint.
These changes removed an artificial limitation on the length of strings stored in Values, which was also the major culprit with respect to wasted memory in Value-based systems such as AttributeSystem. Further, references to the same string are now very cheap, as only one copy of the actual string data is held anywhere in memory.
Arguably, these changes move away from data-oriented-design principles, since strings are no longer plain old data, but instead are references to plain old data, and lookups have been introduced, however the cost is not huge, given that the hashmap is already sorted.
Sparse Arrays meets Parented Transforms: so long, SceneGraph!
FireStorm Systems are derived from an Object Pooling Template.
In simple terms, that means each System stores a specific kind of Data, and we only have one instance of each System - they are effectively singletons.
The base pooling template class implements a custom memory allocator that is very fast for things we create and delete frequently (plain structs, or object classes, both are fine), it allocates a small, flat array of space under the hood, and manages this space in terms of allocated and unallocated space, until forced to grow.
When objects are deleted from a pool, they don't move automatically, they are marked as dead, and their index is stored on a list of 'dead stuff' that is also used as a first option when objects are created.
This is called an object recycling pool mechanism, and it's much faster than your operating system's heap allocator, which can't make guesses about the size of the objects being stored.
So, when we create a new object in a pool, the index of the new or recycled slot is returned.
The new element is identified uniquely by the combination of the pool it lives in (its type), and the index it resides at (its local ID).
This is fine until we start deleting things, or swap things around...
At this point, we create 'holes' in our pool's underlying array space, which although unsightly, they might not present a bother to the cpu cache, and we could in most cases, forgive that they exist.
But some systems like to sort components, or want their components sorted, or maybe we just occasionally want to defragment our pools (eliminate the holes in the arrays), potentially allowing us to dynamically adjust our memory footprint within practical reason.
Transforms are a good example case: we will want our TransformSystem to ensure that our TransformComponents are sorted such that any component that has a parent transform, well, we want the Parent element to appear earlier in the flat array than any of its children - this allows us to get rid of the SceneGraph!
Modifying a SceneGraph becomes simply a matter of modifying the parenting of specific components of a specific system, and the details can be hidden to leave the impression of a conventional OOP scenegraph as far as the user can tell, while still taking full advantage of the hardware and composition benefits that DOD-ECS has to offer.
Once we've sorted our Transforms according to the rule "parents must appear earlier than children", we can linearly process them, there's no need for graph recursion.
In a nutshell: when the TransformSystem is updated, it performs a special kind of bubble-sort pass over the TransformComponents it manages: for each transform component, we check if the component has a parent - if not, we can move on. If it has a parent, we check if the parent is stored at a higher index, and if so, we swap the components in-situ, causing their component ids to be swapped as well, and then we repeat the compare step apon the current transform component, which now holds the parent transform (we're checking if the parent has a parent, and if that parent is at a higher index...) This algorithm has the effect of moving parent nodes to earlier array slot indices.
Defragmenting of component pools is performed similarly, but using a different comparison rule.
In order to defragment a component pool, we iterate the slot indices 'forwards', looking for 'dead' elements, and simultaneously, we iterate the slot indices 'backwards', looking for 'live' elements.
Each time we discover a dead element, we swap it for a living element with the highest known index.
In this way, living elements are moved toward the beginning of the array, while dead elements are moved toward the end of the array. When the operation is complete, there is an opportunity to compact the array to dynamically reduce the memory footprint, if there are more dead elements than live ones (I use a grow/shrink factor of 2).
EDIT (16 July) - I'm looking at the possibility of moving the Sort operation into attach and detach methods, leaving the Update operation to just deal with updating subtrees of transforms that are marked as dirty. It's not a scenegraph, I swear!, but it has similarities with respect to hierarchical transforms - the fundamentals are still the same, but the recursion is gone.
Friday, 10 July 2015
Cameras and Culling in a DOD-ECS
FireStorm ECS provides a Renderable component, managed by the RenderSystem.
In order to facilitate complex rendering scenarios, this component contains references to several NON-ECS objects: a RenderTarget surface, a source of Geometry, a 'ShaderPass Mask', and so on. A less obvious candidate is the Camera.
I spent a lot of time thinking about cameras - the previous engine iteration has a full-blown Camera object that carries its own Projection and View matrices, a Frustum that can be updated on demand, etc. I wanted to support the notion of multiple active cameras, and view-directed renderables.
I recognized that Cameras typically have a Transform of their own, and usually support being 'attached' to other entities (realized through transform parenting).
'Other entities', eh? The clue was there - the ECS Cameras should not be OOP objects, nor should they be Components with their own glorious System to manage them - they should be Entities.
Let's look inside a Camera class, and see what a Camera really needs, in terms of rendering or culling. At minimum, all Cameras need a Projection matrix, and a View matrix, which will be passed (in some form) to our Vertex Shaders - these represent a camera's 'focal aspect' and 'position aspect' - note they can both be expressed as Transform components, since the Camera is now an Entity - and if we follow this path, then 'attaching' a Camera to any other entity in the game is trivial, since that only involves parenting the View transform. Taking this logic to its extreme, the Camera might lead us toward a special kind of Component in the form of a VP (ModelSpace to CameraSpace) Matrix, which has two parents - Proj and View, which is a TransformComponent, and which is Parent to the TransformComponents of any Renderables associated with a given Camera, allowing us to automate the full MVP Baking of Renderables (unless otherwise indicated) but I digress...
What else does a Camera class usually contain?
It might also contain some Bounds derived from the Camera View, aka the Frustum.
The CullableComponent models spatial bounds, so Camera frustums can be treated as first-class citizens of the CullingSystem, potentially simplifying visibility queries per camera.
And it might contain some Direction Vectors, which are easily extracted from the latest View transform.
There seems to be little reason to retain a full-blown OOP model of a Camera, in light of the fact that all the data requirements can be expressed in Component terms - HOWEVER! Camera Transforms need to be baked some time during the rendering procedure, so it would seem there is an argument for a Camera System, even if it does not manage any Components of its own.
This does not sit well with me immediately, given my existing System model presumes that each System manages 'some' kind of data. So I'll eat my words, and find an excuse to create a Camera Component, even if it's just a neat way to locate all Camera Entities.
Here's what I'm proposing, and will be testing over the next few days:
With respect to View-directed renderables (Frustum Culling), I also spent a lot of time thinking about how best to handle this for potentially multiple cameras per Render Target while still shoving everything into one Octree and/or Quadtree, and I decided that since now that Cameras are Entities, I can safely add a Camera Reference (in the form of an Entity ID, which is totally safe from any Component ID Manipulation going on under the hood, so its safer than either a Pointer or a Component Index) to the RenderableComponent, such that each Renderable knows which Camera will be used for culling and drawing it.
The SpatialSystem implements the spatial tree, where entity ids are stored according to knowledge of their Bounds and spatial transforms. The CameraSystem will be responsible for querying the visible set of entities for each Camera based on the values in the RenderableComponent, CullableComponent and TransformComponent (which camera to use, bounds of object, transform of object), resulting in a stream of data for visible graphics objects, including cameraspace z depth and instance transforms, ready to be handed to the RenderGraph for further sorting.
DOD-ECS Memory Management
Data-Oriented software designs can generally be loosely compared to relational databases in terms of function, however they are usually closer to associative arrays in terms of form.
FireStorm's ECS internal data layout is governed by a template class (inherited by all Systems) which is called ObjectPool<T>, indicating that it's a container for data of a specified Type.
The container implements a kind of memory micro-manager, or mini-heap, of objects of Type T - it's a kind of array manager which has several useful features, such as the ability to grow, shrink and defragment the array. The objects it holds can be class object instances or pure structures, and in the case of class objects, high frequency creation and destruction is quite cheap, as the Pool recycles objects from a linkedlist of 'free objects' whenever possible, and only uses Operating System allocation api when 'under duress'. The implication is that the array can contain 'holes' - objects marked as 'dead' remain in the array memory, which can be undesirable, depending on what we're using the array for.
In my case, the class was specifically written to act as a container for Components, and to act as a base for all Systems. I was going to need to detect and handle 'highly fragmented' arrays, but the motivation to do so did not come until I had to handle the tricky situation of Parented Transforms (that deserves a post to itself).
FireStorm identifies unique Components by their Component ID - the flat Storage Index of the element in its Pool. Since elements don't normally move in memory, these don't normally change value, even if the container is reallocated (grows or shrinks), which would not always be true if Pointers were used to identify Components.
On the other hand, Defragmenting or Sorting a Pool does modify Component IDs - but most Systems don't need to do this often (or ever).
In addition to the object pool, each System inherits three other items of interest:
- a message queue mechanism for inter-system messaging
- a multimap that associates a subscriber EntityID with a ComponentID (pool index).
- a multimap that holds the reverse associations, accelerating certain operations
Multimap was chosen because an Entity may own more than one Component of the same Type.
The reversemap was added last, simply because we almost never want to iterate the Components of a given System 'by Entity' - although sometimes, we do.
DOD-ECS implementation details
The abbreviation DOD stands for Data-Oriented Design.
This is a software programming paradigm that emphasizes the importance of data and the layout of data, the notion being that the hardware (especially cpu cache) performance can be improved by adopting hardware-friendly data layouts, certainly an admirable concept.
The DOD Paradigm identifies Data as being primary, and Behavior (Logic) that transforms (manipulates) Data as being secondary, and separates them accordingly.
ECS stands for Entity Component System.
This is a software programming paradigm that offers 'object composition' as an alternative to traditional OOP techniques such as polymorphism and multiple inheritance - you might be familiar with the term 'mix-ins'. Game engines such as Unity are heavily based on ECS concepts.
In abstract terms, entities are comprised of 'Aspects' which may be represented by Components.
An example of an Aspect is 'Transform' - anything you want to draw will need one.
Unity went to the extreme of making Transform a mandatory component of all entities.
When using an ECS for game development, users typically construct and manipulate game entities of whatever kind, but under the hood, the typical ECS does not process or update entities, but rather it processes and updates entire Systems, and all the Components they hold (regardless of which entity they are associated with), thereby updating one 'Aspect' of all entities at a time, rather than updating all 'Aspects' of one entity at a time (implying that Systems are updated in some particular order).
In order for an ECS to be considered as "DOD-Compliant", we simply need to arrange our data to suit that kind of iterating, and respect 'cache line alignment', and having done so, we can easily multithread the Update logic for most kinds of System safely, without needing any mutex mechanisms.
The FireStorm Game Engine provides an Entity Component System which is based strongly on data-oriented design principles. This means it's implementation is quite different to many ECS-based engines such as Unity, although it's user interface offers similar functionality.
If we already have a structured GameObject or Entity class that we want to convert to ECS, we need to analyze the existing structure to decide what 'Aspects' best describe each entity in our game, and for each Aspect, if necessary, define a new Component type, and a System that can Update components of that type.
Any decent ECS will provide 99% of the Components you will ever need for writing games, but adding new kinds of Components should always be possible for special cases.
FireStorm treats Components primarily as Plain Old Data objects that have no associated code methods (that has effect outside that object), while Behavioral Logic for Components resides within the Systems, which directly manipulate Data they own and manage, and may also indirectly (via Messages) manipulate data held by other Systems.
There is no GameObject or Entity class, no container for a 'bag of components'.
Under FireStorm's ECS, entities are merely IDs that can be associated with Components.
FireStorm's Components hold no reference to their owner Entity - those mappings are held by each System. Components can be POD (plain old data), but typically are 'dumb data classes' (like vec3) with methods that only affect member data. Components don't talk - not to other Components, nor to Systems, they are just data containers that might also have getter/setter or similar accessors.
FireStorm's Systems are containers for Components of a particular Type (implementing a pooled allocation scheme) and Messages from other Systems, decoupling Systems from one another during Updates (importantly, this makes multithreading the System Updates a whole lot easier).
Systems contain all the Logic for updating the Typed Components they hold.
They have no direct connection to other Systems, but may post Messages to other Systems, which may be addressed to all Components of a specific Entity in a remote System - that is to say, Messages are not 'broadcast to all Systems and Components', but only to 'all Components of Type X (held by System Y) associated with Entity Z'.
Each System provides at least two methods, which may or may not be used by a given System: Update (the Components owned by this System), and ProcessMessages (sent from other Systems).
Storing your Entity-to-Component Mappings within each System has its benefits, the most important being that each System is intimately aware of which entities own one or more Components held by that System, so introspection can be removed from the entity completely - we query Systems, not entities, although our queries can be broadcast to all systems, and can refer to specific entities.
When a System needs to be Updated, it 'knows' which Components it owns, and which Entities they belong to - if it requires data from other Systems, then it expects to receive that data through Messages from that System or Systems - no external queries are required (to other Systems).
Thursday, 9 July 2015
Foundation
This blog was created to record and share some of my progress during the
development of the FireStorm game engine, which began life roughly a
year ago today.
Recently, I've been working on two related things: software support for hardware concurrency at the game engine level, and data-oriented entity component system technology that can best leverage today's multicore hardware while taking full advantage of the benefits of software composition.
Inspiration has come from many places and people, but special thanks go out to L. Spiro and Niklas
Frykholm whose words have been particularly inspirational.
Dicebat Bernardus Carnotensis nos esse quasi nanos, gigantium humeris insidentes, ut possimus plura eis et remotiora videre, non utique proprii visus acumine, aut eminentia corporis, sed quia in altum subvenimur et extollimur magnitudine gigantea.
Bernard of Chartres used to compare us to puny dwarfs perched on the shoulders of giants.
He pointed out that we see more and farther than our predecessors, not because we have keener vision, or greater height, but because we are lifted up and borne aloft on their gigantic stature.
Recently, I've been working on two related things: software support for hardware concurrency at the game engine level, and data-oriented entity component system technology that can best leverage today's multicore hardware while taking full advantage of the benefits of software composition.
Inspiration has come from many places and people, but special thanks go out to L. Spiro and Niklas
Frykholm whose words have been particularly inspirational.
Dicebat Bernardus Carnotensis nos esse quasi nanos, gigantium humeris insidentes, ut possimus plura eis et remotiora videre, non utique proprii visus acumine, aut eminentia corporis, sed quia in altum subvenimur et extollimur magnitudine gigantea.
Bernard of Chartres used to compare us to puny dwarfs perched on the shoulders of giants.
He pointed out that we see more and farther than our predecessors, not because we have keener vision, or greater height, but because we are lifted up and borne aloft on their gigantic stature.
Subscribe to:
Posts (Atom)