For over two years I had the chance to work in a team owning a microservice that brought some interesting novelties to my tech panorama: we used Axon Framework to implement event sourcing and CQRS patterns within an event-driven architecture, hand in hand with domain-driven design (so many buzzwords!). I experienced and contributed to the evolution of this service from an early stage until it reached maturity, adding new features and maintaining them, so it was a great opportunity to understand the benefits and drawbacks of those patterns in a practical way.
Event sourcing is a persistence pattern that impacts the representation of domain objects (aggregates) in storage. Instead of storing a domain object as a record with the current state, you store a series of events that represent all the changes over a domain object, from its creation until the latest change. In order to calculate the current state of an aggregate, all of its events must be replayed maintaining a chronological order, applying change by change until the latest event.
Command-Query Responsibility Segregation (CQRS) is another pattern commonly found in conjunction with event sourcing. It separates read (queries) and write (commands) operations into different, independent models. The read model (or projection) may only be updated once the write model is successfully updated, so for a period of time the read and write models will be inconsistent: this is an architectural constraint known as eventual consistency.
The combination of event sourcing and CQRS drastically changes the way that a service handles read and write operations:
- Write operations typically follow these steps:
- a command is dispatched on the command gateway
- the command is handled asynchronously and it will either create a new aggregate or perform some change over an existing one
- either way, a new event will be published eventually, and will be appended to the immutable event store
- the new event may have multiple handlers reacting to it; for example, an event handler can apply the change to the projection so that the read model remains consistent with the write model
- Read operations can fetch directly the relevant projection, which will have an optimized data schema for the query (only including the necessary fields) and will be already denormalised, thus avoiding joins in the data store.
At the beginning, when I started working with this architecture, I remember feeling confused with all these new concepts. It required a mindset shift, and learning to design solutions in this new setting. The support of my team colleagues was essential to move in the right direction and get up to speed, along with some extensive reading of the framework’s docs. Luckily I also had experience dealing with complex problems, so after the first weeks, most of it was already making sense and I was quickly gaining experience. By the end of my engagement, I was able to design and implement the management of complex transactions using sagas, but that’s another story!
Here are the benefits and drawbacks that I found working with this architecture using Axon framework:
- Out of the box ability to audit domain changes and “go back in time” through the event timeline, making it easy to debug
- Independent scaling of read and write operations, which could even target two different databases optimised for each use case
- Improved security, as read operations query an optimised read model and thus avoiding data leaks
- Domain logic is close to the domain (aggregate) class or decoupled on each event handler
- Asynchronous processing of commands and events contributes positively to performance and scalability of the system
- Eventual consistency causes read model to temporarily become stale
- Increased complexity for handling common problems like transactional operations (saga pattern may be needed), uniqueness validations (example problem) or event schema evolution over time (requires upcasting existing events)
- The event store can grow quickly, leading to increased consumption of database storage and increased time to calculate the current aggregate status (calculation time can be adressed by using a technique like snapshotting)
- Data governance for processes like GDPR’s right of erasure poses the problem of having to delete entries from the event store, against its immutability principle (crypto-thrashing may be an option)
- Some code duplication can appear in event handlers
- Usually requires a dedicated onboarding and training for new engineers joining the project
Did you miss anything? E-mail me with your suggestions and I can expand this post :)