In recent weeks I’ve had a number of discussions with clients around the topic of multi-tenancy for SaaS platforms in the cloud. While there can be many flavours of what is meant by the term, I’ll use a B2B model of service provision where some level of infrastructure sharing between clients (whether compute, storage/database or identity provider) is in place. Similarly I’ll define client to be the direct paying customer of the SaaS provider, likely another business, while using the term user to refer to either an employee of the client, or a client’s own customer (in a B2B2C model).

Software architecture is a melting pot of the environment it lives within. As per Conway, organisational structure drives cashflow which is crystalised into communication flows and products. Additionally there are practical and cost drivers for various architectural choices.

This excellent AWS paper discusses various levels of sharing commonly seen and some strategies to implement sharing, particulaly around security. Various combinations of silos and pooling are possible — the right choice will depend on the customer base, application model and technology choices. This is also not a one-time choice but journey over the lifetime of the system.

Traditionally the cost model of infrastructure – Capex purchasing of physical hosting and machines – drove a shared-everything model. Both the lumpy nature of buying boxes (for all but the largest of systems) and the lead times to attain capacity formed a need to share the cost across multiple customers.

The world moved on. Commodity servers became cheap linked to software stacks evolving to cope with many-box deployments and horizontal scale. This evolved into the cloud - where not only could you rent fractional server capacity, the volume play allowed zero commitment purchasing - the birth of serverless.

In tech provisioning, Capex had become Opex.

A significant consequence of shared-everything was a lack of customer (or in the new nomenclature of B2B(2C), tenant isolation). All code must be tenant aware, not only from a functional point of view (always tracking the current tenant for filtering and grouping of data) but also from a noisy neighbour and blast radius point of view. Security boundaries play into both aspects.

In today’s cloud two features are driving a shift in behaviour. The (cloud-defining) availablilty of API driven capacity for customers and the rise of serverless (AWS announced 50% of new workloads are serverless in 2020). The latter enables a cost-effective adoption of the former, a new level of dynamicism and a challenge to tenancy architectures of old.

So rapid is this change, that the doyenne of recent years, infrastructure as code (IaC), is being challenged. If the system footprint is dynamic and in response to customer sign-ups / preferences, where does all the yaml / terraform fit in? In practice, only a subset of the infra is likely to be dynamic, and the benefits of declarative configuration are still real. Perhap the declaration will just be absorbed into another API layer – as arguably, Pulumi has done, for example.

In some sense this is just evidence of the usual pendulum of technology, instead of compute swinging from client to server and back, this is code vs config. Config as ever is just code in another, more constrained format. In years gone by what’s currently configured as a cloud-scale config item - for example a Kafka cluster and shard - is the same as a JMS createTopic() call on an app server cluster of yesteryear.

This change in the capability of systems to morph their own footprint opens up many possibilities. From a certain angle even a modern CI/CD delivered with a static footprint will encourage a specific tenancy design. Responding to new client isolation requirements with a different silo model requires a fleetness of foot few can match. The new charging models and infrastructure management tools mean, however, that the overheads of a stronger silo model might well have evaporated. Of course, any lingering tech debt around SecDevOps management needs a mature discussion (see this excellent piece from Rachel at Redmonk).

Coping operationally with dozens of production environments requires focus and tool support. For industrial strength blast containment and reduction in noisy neighbour impact, flexibility in deployment will become a commercial advantage.