The Lambda Architecture, simplified

Addressing complexity in a decades old architecture

“Everything should be made as simple as possible, but not simpler. “ — Albert Einstein

I loathe complexity. I didn’t always, but as I get older I seem to tolerate it less and less. Whether it’s banging up against my brain’s ability to overcome the magic number or seeing the beauty in Occam’s Razor, and what it produces, reducing complexity has for a long time been one of my main missions in life. It didn’t hurt that this was drilled into me on a daily basis during the first decade of my professional career as I developed and maintained a sophisticated software system in which complexity was avoided at all cost.

Lambda Architecture as proposed by Nathan Marz

There is no such thing as a new idea. It is impossible. We simply take a lot of old ideas and put them into a sort of mental kaleidoscope. We give them a turn and they make new and curious combinations. — Mark Twain

The problem faced in the Lambda Architecture is not new — it’s been a thorn in the side of large data systems for decades. Consider the interplay between traditional operational data stores and data warehouses. In these “systems”, data is first collected in one or more operational data stores. These operational data stores are generally ill suited to analytical queries for a number of reasons:

  1. The data is typically in a schema or data format (row organized) which isn’t well suited to analytical queries
  2. The analytics data must often be aggregated from multiple operational data stores for a full view of the enterprise
Traditional Architecture — Operational Data Stores continually feeding Data Warehouse through Message Queue and Continuous Data Ingest process

Those who cannot remember the past are condemned to repeat it. — George Santayana

Why do I bring this up? Curiously enough, right around the time that Lambda emerged (and long before it was widely adopted), the traditional operational data store + data warehouse architecture was being disrupted by Hybrid Transactional/Analytical Processing (HTAP) technology. The idea behind HTAP is to use a single system to handle both transactional and analytical workloads. To make things perform (on both the “real-time” and “batch” sides of the house), these systems are typically in-memory (or are in-memory optimized), employ multiple data formats, and perform some sort of data transformation. In the end however, they appear as single systems from an application perspective.

The best way to predict the future is to invent it — Alan Kay

So where does this leave us with respect to the Lambda Architecture? Can we not try to replace its complexity with an HTAP solution as well? I think the industry is already moving in this direction, as evidenced by Db2 Event Store.

Db2 Event Store supports common streaming sources, persists data quickly to local storage, and then enriches data through indexing and additional meta-data asynchronously in batch. All data is stored in Apache Parquet format on shared storage and can therefore be queried through the Db2 Event Store engine, or directly via any Apache Parquet compatible query engine.

Adam has been developing and designing complex software systems for the last two decades. He is also a son, brother, husband and father.