The Four Event Sourcing Problems Most Resources Don't Cover Clearly
Four event sourcing problems most resources skip: optimistic concurrency, projection rebuilds, schema evolution, and snapshots.
Most event sourcing articles end exactly where the real work begins. They show you how to append an event to a stream, explain why immutability is powerful, maybe walk you through a shopping cart example, and then leave you at the door. What happens when you take that understanding into a real system, with real concurrency, real query requirements, and events that need to evolve as your product does? That’s what this article is about. Not theory. The things that bite you.
The Append-Only Log Is the Easy Part
The first prototype always feels great. Appending events is simple. Reading a stream back is simple. The aggregate-from-history pattern clicks almost immediately. Load your event stream, fold each event through a reducer, arrive at the current state. It’s elegant and it actually makes sense. For a few days, you feel like you’ve found the thing.
Then you ship it.
The append-only log, it turns out, is the part that works. What comes after it, the read models you need to serve your queries, the concurrency controls you need to protect your streams, the schema changes you’ll need to make once the business evolves, that’s where teams consistently underestimate the scope of what they’ve taken on.
None of this should put you off event sourcing. These problems are solvable, and solving them is worth it. But they deserve to be named clearly, because most resources do not name them at all.
Here are the four that bite the hardest.
Problem 1: Optimistic Concurrency, Rarely Covered in Any Depth
In a traditional relational system, you get row-level locking more or less for free. The database ensures that two writers do not overwrite each other silently. In an event-sourced system, you have an append-only log, and unless you implement concurrency control explicitly, two concurrent commands can load the same aggregate, both compute their changes, and both append their events. The second writer has no idea what the first one did. You’ve silently produced an inconsistent stream.
The fix is optimistic concurrency, and the mechanism is the expected stream version.
Every stream has a version: an integer that increments with every appended event. When you load an aggregate, you note the version you loaded it at. When you append new events, you tell the store to only accept the append if the stream is still at version N. If another writer has appended in the meantime, the store rejects your write with a version conflict, and you retry.
Here is what that looks like in practice:
// Load the aggregate at its current versionvar (aggregate, currentVersion) = await store.LoadAsync<OrderAggregate>(streamId);
// Apply the command; this produces new eventsvar newEvents = aggregate.Handle(command);
// Append, but only if the stream has not changed since we loadedawait store.AppendAsync(streamId, newEvents, expectedVersion: currentVersion);// Throws OptimisticConcurrencyException if another writer got there firstThe critical thing: this check must happen at the storage layer, not in your application code. If you load, check, and then append in three separate operations without the storage layer enforcing the version constraint atomically, you have a race condition. The store has to make this atomic.
Most introductory event sourcing content either skips this entirely or mentions it in a footnote. Get it wrong under load, and you’ll produce corrupted streams with no obvious error at the time.
Problem 2: Projections, Where the Complexity Compounds
Projections are read models built by replaying events. They are the answer to how you query an event-sourced system, because you can’t efficiently query a raw event stream for most real-world use cases. You subscribe to your stream (or streams), fold each event through a handler, and produce a materialised view.
The basic pattern works well. The complexity shows up in a few places.
Read model lag. Projections run asynchronously. There’s a window between when an event is appended and when the projection has processed it. For most use cases this is fine. You accept eventual consistency and design your UI accordingly. But you have to reason about it explicitly. “The user just placed an order; why is it not showing in their order list?” Usually it’s because the projection hasn’t caught up yet. That’s a design decision, not a bug, but it needs to be conscious.
Rebuilding from scratch. When you change the logic of a projection, you need to rebuild the read model from the beginning of history. This is actually a feature. Because you have the full event history, you can always recompute any read model. But it has operational implications. For a large stream, a full rebuild takes time, and you need to manage the transition carefully: run the new projection alongside the old one, swap when it catches up, discard the old one.
Cross-aggregate projections. This is where naive implementations break. Consider a customer dashboard that needs to show recent order activity: events from an Order aggregate joined with information from a Customer aggregate. Each stream is separate. Your projection needs to subscribe to both, correlate events by customer ID, and produce a combined view.
public class CustomerDashboardProjection{ private readonly IReadModelStore _store;
public async Task HandleAsync(OrderPlaced evt) { var view = await _store.GetAsync<CustomerDashboardView>(evt.CustomerId); view.RecentOrders.Add(new OrderSummary(evt.OrderId, evt.Total, evt.PlacedAt)); await _store.SaveAsync(view); }
public async Task HandleAsync(CustomerNameUpdated evt) { var view = await _store.GetAsync<CustomerDashboardView>(evt.CustomerId); view.CustomerName = evt.NewName; await _store.SaveAsync(view); }}This works, but now your projection is processing events from multiple streams, potentially out of order, and needs to handle the case where an OrderPlaced arrives before the CustomerCreated event it depends on. Ordering guarantees within a single stream are straightforward. Across streams, you’re responsible for managing that complexity yourself.
Problem 3: Schema Evolution. Your Events Will Change, and Here Is How to Survive It
Every event you publish is a permanent contract with history. You can’t edit the past. If you published an OrderPlaced event with a ProductId field two years ago, that event still exists in your stream with that shape. Forever.
This matters the moment your requirements change, which is roughly always.
The safe approach to schema evolution rests on a few specific patterns.
Upcasting. When you read an old event from the store, transform it into the new shape before it reaches your application code. The old bytes stay on disk, but at read time an upcaster converts them. This is transparent to reducers and projections; they only ever see the current shape.
// Old event shape (v1)public record OrderPlaced_V1(Guid OrderId, string ProductId, decimal Total);
// New event shape (v2; ProductId replaced by a full ProductDetails object)public record OrderPlaced(Guid OrderId, ProductDetails Product, decimal Total);
// Upcaster registered in your serialization pipelinepublic class OrderPlacedUpcaster : IUpcaster<OrderPlaced_V1, OrderPlaced>{ public OrderPlaced Upcast(OrderPlaced_V1 old) => new OrderPlaced(old.OrderId, new ProductDetails(old.ProductId), old.Total);}Versioned event type names. Name your events with explicit version suffixes when you make breaking changes. Use OrderPlaced_V2 rather than mutating OrderPlaced in place. This makes it completely clear in your codebase which handlers are dealing with which version, and removes ambiguity during migration.
Never remove a field that a projection depends on. This is the quiet disaster. You remove a field from an event because it is no longer needed in the command handler, but a projection somewhere was reading that field to build a view. The projection now silently receives null where it expected a value. If you’re lucky you get a null reference exception. If you’re not, you get a corrupted read model.
The discipline is this: treat every published event type as an append-only public API. You can add fields. You can add new event types. You can’t remove fields or change their semantics on existing types without an explicit upcasting strategy in place first.
Problem 4: Snapshots, The Performance Escape Hatch With Sharp Edges
When aggregates have long histories, loading them by replaying every event gets expensive. An order aggregate that has been amended fifty times has fifty events to load and fold. A customer account with years of activity might have thousands. At some point the replay time becomes noticeable.
Snapshots are the standard answer. You periodically persist the current state of the aggregate to a separate store. When loading, you fetch the latest snapshot first, then only replay events that arrived after the snapshot was taken. Instead of replaying the full history, you replay a much shorter tail.
public async Task<(TAggregate, long)> LoadWithSnapshotAsync<TAggregate>(string streamId) where TAggregate : new(){ var snapshot = await _snapshotStore.GetLatestAsync<TAggregate>(streamId);
var fromVersion = snapshot?.Version ?? 0; var aggregate = snapshot?.State ?? new TAggregate();
var events = await _eventStore.LoadFromAsync(streamId, fromVersion); return (events.Aggregate(aggregate, ApplyEvent), fromVersion + events.Count);}But snapshots come with their own complications.
Most teams snapshot too early. The replay of a few hundred events is almost always fast enough. You don’t need snapshots until you genuinely have a performance problem. Adding snapshot logic before you need it adds complexity for no benefit.
Snapshots couple tightly to your aggregate shape. If your aggregate changes shape because you’ve updated your event handlers to produce a different state structure, existing snapshots become stale or invalid. You need a snapshot versioning strategy that mirrors the event versioning strategy from the previous section. The two problems interact.
And the snapshot is not the source of truth. If you find a bug in your aggregate reducer, you need to be able to throw away all snapshots and replay from raw events. Snapshots are an optimisation, not a record. Your system must always be able to function correctly without them.
The Surrounding Infrastructure
Step back from any one of these four problems and you’ll notice the pattern: each one requires a piece of infrastructure you didn’t think about when you wrote your first AppendEvent call.
Optimistic concurrency requires atomic version-checked writes at the storage layer. Projections require a reliable subscription mechanism with at-least-once delivery guarantees, backpressure handling, dead-letter queues for projection failures, and a way to track which events each projection has processed. Schema evolution requires an upcasting pipeline wired into your serialization layer. Snapshots require a secondary store with its own lifecycle management and versioning strategy.
None of this is impossible. Every successful event-sourced system in production has solved all of it. But the scope is genuinely larger than most people expect when they start, and it is worth being honest about that upfront. You’re not just choosing an event log; you’re choosing to build and maintain the surrounding system that makes event sourcing actually work in production.
A Conscious Trade-off
None of these four problems should put you off event sourcing. They are the honest cost of doing it properly, and the systems that get it right earn real benefits: a complete audit history, the ability to derive new read models from existing data without touching the source, easier debugging because you can replay any sequence of events and inspect exactly what happened.
The question is about allocation. Do you want your engineering time spent on optimistic concurrency logic, projection pipelines, snapshot management, and subscription infrastructure, or on the business logic that actually differentiates your product?
That trade-off is worth making consciously, with eyes open to the full scope of what you’re taking on.
If you’d rather not build the infrastructure yourself, Hapnd handles all of this. You push your reducers and projections, the business logic that makes your system unique, and Hapnd takes care of storage, streaming, concurrency, snapshots, and scale. The beta is open at hapnd.dev. No credit card, no sales call.