Saturday, 11 July 2015

Core Values: Treating strings as reference-counted, shareable objects


FireStorm provides a CORE::Value container class, originally introduced to facilitate interoperability with the Lua script language,.
Value class is basically a variant-type container that supports all the usual suspects (boolean, number, string, object, table, etc), implemented as a unioned structure. Until today, this structure was bloated by a 256-byte buffer whose purpose is to hold data for a short string, while all other possible types require far less memory than that.
This never presented as a problem until I recently implemented an ECS AttributeComponent, which essentially wraps a pair of Values (key, value pair). When the components were aligned in a Pool, it became painfully obvious how much memory was being wasted, so the decision was made to move strings out of the Value container, replacing them with a unique integer identifier whose value is based on a string hashing algorithm.
Values of 'string type' now hold 32-bit hashes, which reference unique strings.

The global container for unique strings is a map whose keys are hashes, and whose values are pairs containing the original string, and a reference count, which is used to unload strings when the last remaining reference is removed. Dynamic strings are not a problem, and modifying a string Value does not affect external references to the old string Value.

The changes to the Value container class seemed trivial, however I did run into a few problems relating to destruction of local value copies. Now it's done, the size of an AttributeComponent has been reduced from 512 bytes to 16 bytes per Attribute, making Values and Attributes a lot more attractive in terms of memory footprint.

These changes removed an artificial limitation on the length of strings stored in Values, which was also the major culprit with respect to wasted memory in Value-based systems such as AttributeSystem. Further, references to the same string are now very cheap, as only one copy of the actual string data is held anywhere in memory.

Arguably, these changes move away from data-oriented-design principles, since strings are no longer plain old data, but instead are references to plain old data, and lookups have been introduced, however the cost is not huge, given that the hashmap is already sorted.





No comments:

Post a Comment