Friday 17 July 2015

Multi-Pass Shaders


There's no such thing, really, as a multi-pass shader.
To perform multi-pass rendering, we select a shader and optionally a render target surface (FBO), and we draw some subset of our renderables. Then we select another shader, and repeat the process.

Multi-Pass Rendering involves multiple shaders, and drawing stuff more than once.
Not all renderables will be drawn during every 'pass' - for example, a pass might only draw Opaque objects, or it might only draw light volumes.

Materials were used in the days of single-pass rendering to encapsulate all the various requirements for drawing surfaces, mainly texture and light equation variables, which would then be presented, typically, to Uniforms of a very specific 'game geometry shader'.

In order to support complex 'multi-pass' rendering, it appears to make sense that we should further extend the Material class to hold data for a number of shaders, and variables that target a number of passes, which all sounds rather elaborate. It was decided that Materials are in fact not the appropriate place to declare mappings for multiple shader passes.

FireStorm's Renderer implements a 'RenderGraph' made from 'RenderNodes', where each node (except the root node) represents a render target surface, and the root node represents final screen output. The rendergraph is described elsewhere in this blog, but basically this tree of rendertargets can define complex scene compositions - the composition being exactly how the final screen image is built up from zero or more rendertargets, which can be arranged as a dependency tree.

RenderNodes don't interact with a Material class - they instead deal with RenderState and ShaderPass classes, the latter being a container for visible Renderables, which contain a bitkey used to associate a renderable with a subset of shaderpasses held by that renderable's specific target RenderNode.
In this way, each Renderable can declare which RenderNode it will be rendered to, and which ShaderPasses (within that Node) it will be rendered by.
Each shader can register its interest in various Material Attributes, and should it be required, a different material can be associated with each Renderable, for each ShaderPass.

When a RenderNode is processed, it executes a series of ShaderPasses, each with its own RenderState, its own Shader, and its own subset of (references to) Renderables, the results being drawn to that Node's rendertarget (either an FBO, or the screen).


Thursday 16 July 2015

FireStorm ECS -> RenderGraph


The ultimate aims of the FireStorm ECS are those of any game system: apply behavioral logic, determine what is visible (and implicitly, what isn't), and draw visible stuff to some target surface, using some shader, some camera view and projection, etc.

When we reach the RenderableSystem, we have reached the end of the ECS mechanism: renderables which are visible to some camera will issue a draw request to some draw target.

FireStorm implements a RenderGraph which represents the high-level scene composition tree in terms of RenderTarget Surfaces (FBOs in GL speak) and defines how they are connected.
The RenderGraph can be visualized as a 'reverse tree', where the 'root node' represents the final output to the Screen, while every other 'RenderNode' is a FrameBuffer Object. This arrangement is close to what happens when we 'compose' our scene using multiple rendering stages via arbitrary control logic, and models what we actually want to express as graphics programmers (in my humble opinion) - how each target texture is to be rendered, and how they chain together to form the final image. This is non trivial due to Multiple Render Target technology, and in my opinion is best and most cleanly described in graph terms. Others may hold different opinions, please, chime in!

Please note again, the RenderGraph is *not* part of the ECS, it's not a Component or a System, not everything needs to fit inside the ECS.

The ECS RenderingSystem emits 'virtual draw requests' to specific nodes of the RenderGraph, they are sorted, batched and collected as Messages, awaiting final draw processing, which is realized through 'recursion return'. As usual, it can be multithreaded fairly easily, up to this point.

When it's time to draw the frame, the RenderGraph itself is recursed by the MAIN PROCESS THREAD, which owns the ONLY RENDER CONTEXT.
We start at the output node (the tree root), recurse the tree as we normally might, but only process the Messages captured by each RenderNode as we are RETURNING from recursion.
This ensures that Child rendertarget nodes are drawn ahead of Parent nodes who depend on them (this is a reverse tree remember? the root node is the final output...)

By processing each node only *after* recursing its children, we end up rendering to our rendertargets in such an order that any target surface is always ready for use as an input texture to a higher-order node (where 'higher order' means 'closer to the root node, which is final output')

RenderNodes are containers for render surfaces (output textures, MRT supported), but they are also containers for ShaderPasses. In order to draw a RenderNode, we execute its ShaderPasses. Renderables subscribe to specific Passes via a binary key, so we can draw some things in some passes, and not in others, to a specific surface, and as we've learned, with a specific camera.

DOD-ECS RenderableComponent : this is where the ECS ends, and the Renderer begins.





Today I fleshed out the Lua interface for the RenderableComponent.
Without explaining what FireStorm's Script Bindings are about, here it is:

LUA_CLASS_METHODS(RenderableComponent)    =
{
    LUA_CLASS_METHOD( getRenderTarget,    RenderableComponent::get_targetSurface),        // target RenderNode (container for FBO)
    LUA_CLASS_METHOD( setRenderTarget,    RenderableComponent::set_targetSurface),
    LUA_CLASS_METHOD( getCameraID,        RenderableComponent::get_CameraID),     // which camera entity to use for culling and POV-rendering
    LUA_CLASS_METHOD( setCameraID,        RenderableComponent::set_CameraID),
    LUA_CLASS_METHOD( getGeometry,        RenderableComponent::get_geometry),                // reference to GLVertexArray
    LUA_CLASS_METHOD( setGeometry,        RenderableComponent::set_geometry),
    LUA_CLASS_METHOD( getPrimitiveType,    RenderableComponent::get_primitiveType),        // instructions for drawing
    LUA_CLASS_METHOD( setPrimitiveType,    RenderableComponent::set_primitiveType),
    LUA_CLASS_METHOD( getFirstIndex,    RenderableComponent::get_firstIndex),
    LUA_CLASS_METHOD( setFirstIndex,    RenderableComponent::set_firstIndex),
    LUA_CLASS_METHOD( getNumIndices,    RenderableComponent::get_numIndices),
    LUA_CLASS_METHOD( setNumIndices,    RenderableComponent::set_numIndices),
    {0,0}
};


We can learn a lot about FireStorm's internals by examining the above interface description.
For example, it supports multiple active render cameras within a single frame.

Each renderable can be associated with exactly one camera, and exactly one render target surface.

 We can also see that rendering is broken down into virtual draw requests which can be sorted and batched elsewhere. Indexed drawing, and drawing of indexed subsets is supported.
The PrimitiveType can be set per renderable, and we know that FireStorm's ECS supports more than one renderable per entity, should we require it (support for submeshes is implied).

I'm still not sold on the idea of exposing Component classes to the Script Layer (Lua in my case).
It seems to make more sense design-wise to expose System classes, and force any 'setters' to use the Message mechanism rather than doing direct writes. Have to think about it some more.
If I decide that I should be dealing with Systems, and not Components, on the Script side, then so be it, some code moves around, nothing's lost.




Tuesday 14 July 2015

DOD-ECS: Decoupling Systems is good for Concurrency, and good for Cache

 FireStorm ECS implements a kind of distributed messaging service, fed by a central Dispatch mechanism: each System provides a Message Queue to receive Messages sent from other Systems, and addressed to one or more of its Components.

Just before each ECS System is Updated, any Messages sent to that System are Delivered, in terms of actually being Processed.
Messages are the primary way for the ECS Systems to send data to other Systems.
Since each System is updated separately, and has its own Message Queue,
the Message paradigm effectively decouples Systems from one another, which is good for Concurrency - the implications are that it's safe for multiple Threads to update different Components owned by the same System, and it's always safe to Read from Components of Another System, with no need for any kind of locking, other than a 'barrier' to ensure only one System is updated at any moment in time.

System Updates are the primary cause of messages getting sent to other Systems.
Since each System processes its Messages before updating its Components,
we usually want to send Messages to Systems that are of lower priority (are yet to be Updated).
The reason is that Messages sent to higher-priority Systems won't be processed until the next Frame... so the trick is to try to arrange your system updates rationally (which I impose through ordered system ids).

So - why have a Message QUEUE at all? Why not just process ECS events as soon as they occur?
Messages are used to convey data from a component in one System, to a component in another System, usually for Writing locally within that remote component.
They are a means of implementing 'deferred-write' thread safety!
Is concurrency the only benefit to be gained from buffering inter-System messages?

We also benefit in at least one other important way: we get a much better data access pattern,
leading to much improved cache performance, which is perhaps the number one performance killer in modern games.

Think about it: if a System processed ECS events as we generated them, we would be immediately touching data
owned by other Systems in a non-linear fashion, which implies lots of cache misses.
We would later go on to process that System, by which time any cache warming we might have gained has likely been lost.

Rather, if each System first deals with messages sent to its components, then updates its components,
we get the full benefit of cache-warming just before our full linear component update pass.
If we're careful, the overhead we pay for this should be a lot less than the performance gained through
the application of cache-friendly data access patterns.

Saturday 11 July 2015

Core Values: Treating strings as reference-counted, shareable objects


FireStorm provides a CORE::Value container class, originally introduced to facilitate interoperability with the Lua script language,.
Value class is basically a variant-type container that supports all the usual suspects (boolean, number, string, object, table, etc), implemented as a unioned structure. Until today, this structure was bloated by a 256-byte buffer whose purpose is to hold data for a short string, while all other possible types require far less memory than that.
This never presented as a problem until I recently implemented an ECS AttributeComponent, which essentially wraps a pair of Values (key, value pair). When the components were aligned in a Pool, it became painfully obvious how much memory was being wasted, so the decision was made to move strings out of the Value container, replacing them with a unique integer identifier whose value is based on a string hashing algorithm.
Values of 'string type' now hold 32-bit hashes, which reference unique strings.

The global container for unique strings is a map whose keys are hashes, and whose values are pairs containing the original string, and a reference count, which is used to unload strings when the last remaining reference is removed. Dynamic strings are not a problem, and modifying a string Value does not affect external references to the old string Value.

The changes to the Value container class seemed trivial, however I did run into a few problems relating to destruction of local value copies. Now it's done, the size of an AttributeComponent has been reduced from 512 bytes to 16 bytes per Attribute, making Values and Attributes a lot more attractive in terms of memory footprint.

These changes removed an artificial limitation on the length of strings stored in Values, which was also the major culprit with respect to wasted memory in Value-based systems such as AttributeSystem. Further, references to the same string are now very cheap, as only one copy of the actual string data is held anywhere in memory.

Arguably, these changes move away from data-oriented-design principles, since strings are no longer plain old data, but instead are references to plain old data, and lookups have been introduced, however the cost is not huge, given that the hashmap is already sorted.





Sparse Arrays meets Parented Transforms: so long, SceneGraph!


FireStorm Systems are derived from an Object Pooling Template.
In simple terms, that means each System stores a specific kind of Data, and we only have one instance of each System - they are effectively singletons.

The base pooling template class implements a custom memory allocator that is very fast for things we create and delete frequently (plain structs, or object classes, both are fine), it allocates a small, flat array of space under the hood, and manages this space in terms of allocated and unallocated space, until forced to grow.
When objects are deleted from a pool, they don't move automatically, they are marked as dead, and their index is stored on a list of 'dead stuff' that is also used as a first option when objects are created.
This is called an object recycling pool mechanism, and it's much faster than your operating system's heap allocator, which can't make guesses about the size of the objects being stored.

So, when we create a new object in a pool, the index of the new or recycled slot is returned.
The new element is identified uniquely by the combination of the pool it lives in (its type), and the index it resides at (its local ID).

This is fine until we start deleting things, or swap things around...
At this point, we create 'holes' in our pool's underlying array space, which although unsightly, they might not present a bother to the cpu cache, and we could in most cases, forgive that they exist.
But some systems like to sort components, or want their components sorted, or maybe we just occasionally want to defragment our pools (eliminate the holes in the arrays), potentially allowing us to dynamically adjust our memory footprint within practical reason.

Transforms are a good example case: we will want our TransformSystem to ensure that our TransformComponents are sorted such that any component that has a parent transform, well, we want the Parent element to  appear earlier in the flat array than any of its children - this allows us to get rid of the SceneGraph!
Modifying a SceneGraph becomes simply a matter of modifying the parenting of specific components of a specific system, and the details can be hidden to leave the impression of a conventional OOP scenegraph as far as the user can tell, while still taking full advantage of the hardware and composition benefits that DOD-ECS has to offer.
Once we've sorted our Transforms according to the rule "parents must appear earlier than children", we can linearly process them, there's no need for graph recursion.

In a nutshell: when the TransformSystem is updated, it performs a special kind of bubble-sort pass over the TransformComponents it manages: for each transform component, we check if the component has a parent - if not, we can move on. If it has a parent, we check if the parent is stored at a higher index, and if so, we swap the components in-situ, causing their component ids to be swapped as well, and then we repeat the compare step apon the current transform component, which now holds the parent transform (we're checking if the parent has a parent, and if that parent is at a higher index...) This algorithm has the effect of moving parent nodes to earlier array slot indices.

Defragmenting of component pools is performed similarly, but using a different comparison rule.
In order to defragment a component pool, we iterate the slot indices 'forwards', looking for 'dead' elements, and simultaneously, we iterate the slot indices 'backwards', looking for 'live' elements.
Each time we discover a dead element, we swap it for a living element with the highest known index.
In this way, living elements are moved toward the beginning of the array, while dead elements are moved toward the end of the array. When the operation is complete, there is an opportunity to compact the array to dynamically reduce the memory footprint, if there are more dead elements than live ones (I use a grow/shrink factor of 2).

EDIT (16 July) - I'm looking at the possibility of moving the Sort operation into attach and detach methods, leaving the Update operation to just deal with updating subtrees of transforms that are marked as dirty. It's not a scenegraph, I swear!, but it has similarities with respect to hierarchical transforms - the fundamentals are still the same, but the recursion is gone.


Friday 10 July 2015

Cameras and Culling in a DOD-ECS


FireStorm ECS provides a Renderable component, managed by the RenderSystem.
In order to facilitate complex rendering scenarios, this component contains references to several NON-ECS objects: a RenderTarget surface, a source of Geometry, a 'ShaderPass Mask', and so on. A less obvious candidate is the Camera.

I spent a lot of time thinking about cameras - the previous engine iteration has a full-blown Camera object that carries its own Projection and View matrices, a Frustum that can be updated on demand, etc. I wanted to support the notion of multiple active cameras, and view-directed renderables.

I recognized that Cameras typically have a Transform of their own, and usually support being 'attached' to other entities (realized through transform parenting).
'Other entities', eh? The clue was there - the ECS Cameras should not be OOP objects, nor should they be Components with their own glorious System to manage them - they should be Entities.

Let's look inside a Camera class, and see what a Camera really needs, in terms of rendering or culling. At minimum, all Cameras need a Projection matrix, and a View matrix, which will be passed (in some form) to our Vertex Shaders - these represent a camera's 'focal aspect' and 'position aspect' - note they can both be expressed as Transform components, since the Camera is now an Entity - and if we follow this path, then 'attaching' a Camera to any other entity in the game is trivial, since that only involves parenting the View transform. Taking this logic to its extreme, the Camera might lead us toward a special kind of Component in the form of a VP (ModelSpace to CameraSpace) Matrix, which has two parents - Proj and View, which is a TransformComponent, and which is Parent to the TransformComponents of any Renderables associated with a given Camera, allowing us to automate the full MVP Baking of Renderables (unless otherwise indicated) but I digress...

What else does a Camera class usually contain?
It might also contain some Bounds derived from the Camera View, aka the Frustum.
The CullableComponent models spatial bounds, so Camera frustums can be treated as first-class citizens of the CullingSystem, potentially simplifying visibility queries per camera.

And it might contain some Direction Vectors, which are easily extracted from the latest View transform.

There seems to be little reason to retain a full-blown OOP model of a Camera, in light of the fact that all the data requirements can be expressed in Component terms - HOWEVER! Camera Transforms need to be baked some time during the rendering procedure, so it would seem there is an argument for a Camera System, even if it does not manage any Components of its own.

This does not sit well with me immediately, given my existing System model presumes that each System manages 'some' kind of data. So I'll eat my words, and find an excuse to create a Camera Component, even if it's just a neat way to locate all Camera Entities.

Here's what I'm proposing, and will be testing over the next few days:
With respect to View-directed renderables (Frustum Culling), I also spent a lot of time thinking about how best to handle this for potentially multiple cameras per Render Target while still shoving everything into one Octree and/or Quadtree, and I decided that since now that Cameras are Entities, I can safely add a Camera Reference (in the form of an Entity ID, which is totally safe from any Component ID Manipulation going on under the hood, so its safer than either a Pointer or a Component Index) to the RenderableComponent, such that each Renderable knows which Camera will be used for culling and drawing it.
The SpatialSystem implements the spatial tree, where entity ids are stored according to knowledge of their Bounds and spatial transforms. The CameraSystem will be responsible for querying the visible set of entities for each Camera based on the values in the RenderableComponent, CullableComponent and TransformComponent (which camera to use, bounds of object, transform of object), resulting in a stream of data for visible graphics objects, including cameraspace z depth and instance transforms, ready to be handed to the RenderGraph for further sorting.

DOD-ECS Memory Management


Data-Oriented software designs can generally be loosely compared to relational databases in terms of function, however they are usually closer to associative arrays in terms of form.

FireStorm's ECS internal data layout is governed by a template class (inherited by all Systems) which is called ObjectPool<T>, indicating that it's a container for data of a specified Type.
The container implements a kind of memory micro-manager, or mini-heap, of objects of Type T - it's a kind of  array manager which has several useful features, such as the ability to grow, shrink and defragment the array. The objects it holds can be class object instances or pure structures, and in the case of class objects, high frequency creation and destruction is quite cheap, as the Pool recycles objects from a linkedlist of 'free objects' whenever possible, and only uses Operating System allocation api when 'under duress'. The implication is that the array can contain 'holes' - objects marked as 'dead' remain in the array memory, which can be undesirable, depending on what we're using the array for.

In my case, the class was specifically written to act as a container for Components, and to act as a base for all Systems. I was going to need to detect and handle 'highly fragmented' arrays, but the motivation to do so did not come until I had to handle the tricky situation of Parented Transforms (that deserves a post to itself).

FireStorm identifies unique Components by their Component ID - the flat Storage Index of the element in its Pool. Since elements don't normally move in memory, these don't normally change value, even if the container is reallocated (grows or shrinks), which would not always be true if Pointers were used to identify Components.
On the other hand, Defragmenting or Sorting a Pool does modify Component IDs - but most Systems don't need to do this often (or ever).


In addition to the object pool, each System inherits three other items of interest:
- a message queue mechanism for inter-system messaging
- a multimap that associates a subscriber EntityID with a ComponentID (pool index).
- a multimap that holds the reverse associations, accelerating certain operations

Multimap was chosen because an Entity may own more than one Component of the same Type.
The reversemap was added last, simply because we almost never want to iterate the Components of a given System 'by Entity' - although sometimes, we do.


DOD-ECS implementation details


The abbreviation DOD stands for Data-Oriented Design.
This is a software programming paradigm that emphasizes the importance of data and the layout of data, the notion being that the hardware (especially cpu cache) performance can be improved by adopting hardware-friendly data layouts, certainly an admirable concept.
The DOD Paradigm identifies Data as being primary, and Behavior (Logic) that transforms (manipulates) Data as being secondary, and separates them accordingly.

ECS stands for Entity Component System.
This is a software programming paradigm that offers 'object composition' as an alternative to traditional OOP techniques such as polymorphism and multiple inheritance - you might be familiar with the term 'mix-ins'. Game engines such as Unity are heavily based on ECS concepts.
In abstract terms, entities are comprised of 'Aspects' which may be represented by Components.
An example of an Aspect is 'Transform' - anything you want to draw will need one.
Unity went to the extreme of making Transform a mandatory component of all entities.

When using an ECS for game development, users typically construct and manipulate game entities of whatever kind, but under the hood, the typical ECS does not process or update entities, but rather it processes and updates entire Systems, and all the Components they hold (regardless of which entity they are associated with), thereby updating one 'Aspect' of all entities at a time, rather than updating all 'Aspects' of one entity at a time (implying that Systems are updated in some particular order).

In order for an ECS to be considered as "DOD-Compliant", we simply need to arrange our data to suit that kind of iterating, and respect 'cache line alignment', and having done so, we can easily multithread the Update logic for most kinds of System safely, without needing any mutex mechanisms.

The FireStorm Game Engine provides an Entity Component System which is based strongly on data-oriented design principles. This means it's implementation is quite different to many ECS-based engines such as Unity, although it's user interface offers similar functionality.

If we already have a structured GameObject or Entity class that we want to convert to ECS, we need to analyze the existing structure to decide what 'Aspects' best describe each entity in our game, and for each Aspect, if necessary, define a new Component type, and a System that can Update components of that type.
Any decent ECS will provide 99% of the Components you will ever need for writing games, but adding new kinds of Components should always be possible for special cases.


FireStorm treats Components primarily as Plain Old Data objects that have no associated code methods (that has effect outside that object), while Behavioral Logic for Components resides within the Systems, which directly manipulate Data they own and manage, and may also indirectly (via Messages) manipulate data held by other Systems.

There is no GameObject or Entity class, no container for a 'bag of components'.
Under FireStorm's ECS, entities are merely IDs that can be associated with Components.

FireStorm's Components hold no reference to their owner Entity - those mappings are held by each System. Components can be POD (plain old data), but typically are 'dumb data classes' (like vec3) with methods that only affect member data. Components don't talk - not to other Components, nor to Systems, they are just data containers that might also have getter/setter or similar accessors.

FireStorm's Systems are containers for Components of a particular Type (implementing a pooled allocation scheme) and Messages from other Systems, decoupling Systems from one another during Updates (importantly, this makes multithreading the System Updates a whole lot easier).
Systems contain all the Logic for updating the Typed Components they hold.
They have no direct connection to other Systems, but may post Messages to other Systems, which may be addressed to all Components of a specific Entity in a remote System - that is to say, Messages are not 'broadcast to all Systems and Components', but only to 'all Components of Type X (held by System Y) associated with Entity Z'.
Each System provides at least two methods, which may or may not be used by a given System: Update (the Components owned by this System), and ProcessMessages (sent from other Systems).

Storing your Entity-to-Component Mappings within each System has its benefits, the most important being that each System is intimately aware of which entities own one or more Components held by that System, so introspection can be removed from the entity completely - we query Systems, not entities, although our queries can be broadcast to all systems, and can refer to specific entities.

When a System needs to be Updated, it 'knows' which Components it owns, and which Entities they belong to - if it requires data from other Systems, then it expects to receive that data through Messages from that System or Systems - no external queries are required (to other Systems).






Thursday 9 July 2015

Foundation

This blog was created to record and share some of my progress during the development of the FireStorm game engine, which began life roughly a year ago today.
Recently, I've been working on two related things: software support for hardware concurrency at the game engine level, and data-oriented entity component system technology that can best leverage today's multicore hardware while taking full advantage of the benefits of software composition.

Inspiration has come from many places and people, but special thanks go out to L. Spiro and Niklas
Frykholm whose words have been particularly inspirational.

Dicebat Bernardus Carnotensis nos esse quasi nanos, gigantium humeris insidentes, ut possimus plura eis et remotiora videre, non utique proprii visus acumine, aut eminentia corporis, sed quia in altum subvenimur et extollimur magnitudine gigantea.

Bernard of Chartres used to compare us to puny dwarfs perched on the shoulders of giants.
He pointed out that we see more and farther than our predecessors, not because we have keener vision, or greater height, but because we are lifted up and borne aloft on their gigantic stature.