XDDD under the hood part 1 - data

Apr 23, 2024

To define solutions in data, you need a flexible and secure data store.

People take different approaches to technology. Some people only want to know what it does, and glaze over if you try explaining how it works. I have had that drilled into me countless times while preparing for sales presentations. But this short series is for people who, like me, need to understand how something works before they can believe in it.

In this five-part series I am going to explain how our products implement extensible data-driven development (XDDD). I know that the claims I make about it - that it gives a tenfold increase in development productivity - are hard to believe. Understanding how it works makes it much more believable.

This is not a single, simple idea. It is a combination of multiple ideas that we know works because we have built it. It is difficult to implement - it took us about 20 man years.

As you would expect, XDDD starts with data. If you are going to have a system that can do anything, you need a data layer that can hold any data in any structure.

Most systems are built using a fixed data structure, for example tables in a relational database. We needed something more flexible, and implemented a data layer based on a triple store (or an entity-attribute-value store if you prefer). In this, data is broken down into triples with a subject ("The cat"), a predicate ("sat on") and an object ("the mat").

The advantage of this approach is that the data structure is not limited. You can hold whatever data you like without breaking the structure.

The disadvantage of this approach is that the data structure is not limited. You can hold whatever data you like, which would quickly lead to chaos.

To address this, we also hold definitions of the data within the same data structure. Each subject ("The cat") links to another subject that defines its type, and the type defines the predicates that subjects of that type can have. In this way the data store becomes self-defining and self-constraining. The definitions also have types, and so on, and everything ends in a single item which defines itself, the "type type".

We don't use the terms subject, predicate and object. We use the term "node" to define subjects, we call predicates "member types" or "fields", and we represent objects either as links to other nodes, as values, or a combination of both. We call the data store a "node store".

We have implemented optimisations around the basic triple store model, including sequencing for repeated data, version control, ownership and audit information. We also have data about users, user groups and permissions, to constrain who can do what with the data.

We use delimited references for nodes, like file paths, which allows us to partition and view the data as a hierarchy as well as a linked data structure.

What this gives us is a data store that can hold any data, that can be queried to understand the data, is owned and partitionable, and is secured at the data level.

Next week I will describe how we start turning this into a system.

Metrici Technology

Discussion about this post