Separating compute and storage

What it means, and why it’s important for databases

If you follow trends in the database industry, you will have noticed that the separation of compute and storage has become a hot topic over the last several years. The ability to separate compute and storage allows database software increased availability and scalability, and has the potential to dramatically reduce costs— benefits which are driving the movement’s momentum. Before diving into the benefits however, it’s best to first level-set on what is meant by separating compute and storage.

“Separating compute and storage” involves designing databases systems such that all persistent data is stored on remote, network attached storage. In this architecture, local storage is only used for transient data, which can always be rebuilt (if necessary) from the data on persistent storage.

Historically, database systems have principally been architected with “tightly coupled compute and storage” where there is a dependance on locally attached storage for some, if not all, of the persistent data. Clearly, this tight coupling is the opposite of separating compute and storage. Why were systems designed this way? Primarily for performance reasons.

Database systems were originally designed in the 1950’s for transaction processing. One of the best examples of such a system is SABRE, which was designed to automate airline bookings, launched in 1960, and is still in use by the travel industry today, processing more than 57,000 transactions every second. If you’ve booked travel sometime in your lifetime, more than likely that travel booking went through the SABRE system.

In transactional systems, latency is king, and ensuring that transactions can complete in less than a second (often in just a few milliseconds) is key. As a result, these original database systems had to keep storage as close as possible to the compute so that the persistence of these transactions (something required to ensure ACID properties) could complete as quickly as possible. If storage was network attached, especially with network speeds in the 1960's, latency would have suffered.

Fast forward a few decades and a lot has changed. First, databases are no longer just used for transaction processing. Once it was observed that transactional data was a treasure trove of business information, companies build data warehouses which stored processed transactional data, often aggregated across many systems, and supplemented that with additional data. With these data warehouses, businesses began running complex analytical queries to determine how to drive smarter business practices. To this day, analytical queries are typically long-running (seconds, minutes, hours, sometimes even days) and as a result storage latency was less of a concern.

Additionally, the dynamics of hardware changed. While microchip transistors have increased in power following Moore’s law, single threaded performance has leveled off, and memory bandwidth has not kept up with network and storage bandwidth. Couple that with the fact that human perception of transaction completion time has physical limits, it became possible in the 90’s to build transactional systems that were predominantly dependant on network attached storage, albeit only for some workloads, and often requiring specialized hardware. More recently these constraints have been lifted (largely due to faster networks), and shared storage transactional systems are becoming mainstream.

With the reduced requirements for tightly coupled storage, database software architects observed that there were additional benefits to separating compute and storage:

Unfortunately, software advancements often lag hardware advancements, and even though the technical barriers to separated compute and storage have vanished, retrofitting existing database architectures to leverage remote storage is difficult and time consuming. That being said, several databases vendors have devised novel architectures over the last decade which leverage separated compute and storage (IBM Db2 pureScale & Db2 Warehouse on Cloud, AWS Aurora, Snowflake, etc.). I’m of the opinion that in the coming years, it will be commonplace for all new database systems to embrace the separation of compute and storage, and the migration of existing systems to this paradigm will accelerate.

Adam has been developing and designing complex software systems for the last two decades. He is also a son, brother, husband and father.