---
title: 'The Performance Factor in Event Sourcing: What You Need to Know'
date: '2025-01-10'
author: 'daniel-badura'
tags: ['PHP', 'EventSourcing', 'Performance']
contentPreview: 'This article addresses the common concern regarding the performance of event sourcing, particularly the speed at which long-living aggregates with many events are loaded. It explores solutions such as snapshotting and stream splitting to optimize aggregate loading. Furthermore, projections allow for the creation of highly flexible and optimized read models, each can be tailored to specific needs.'
---

One of the most asked questions when discussing [event sourcing](/docs/event-sourcing/latest) is:  
Doesn't it take ages to load all the events? This article will answer this question in all its facets. First, we need to
divide this question into two different cases: read and write.

## The Writing Side: Aggregate

When creating the events in the aggregate, they are stored in the event store, which operates as an append-only log.
When loading an aggregate, all corresponding events are queried and replayed to reconstruct the aggregate's state. This
often raises the question: will loading a long-living aggregate with hundreds or even thousands of events to rebuild the
aggregate state become a performance bottleneck?

### Append-Only Log

#### Writes

In most cases, writes should be much faster initially compared to a traditional normalized table structure. This is
because we only have a handful of indexes on the event store table. In normalized tables, there are often unique keys
and many foreign keys on the tables, which are extra indexes and constraints that need to be checked for every write
operation. On the other hand, due to the nature of the event store holding a lot of entries, like millions of rows, the
unique constraint will slow things down over time. It would take a long time to reach the threshold of experiencing slow
writes in the event store, and if you encounter it, you can still shorten your stream by splitting it.

#### Updates

Due to the immutability of the event stream, we are normally not performing any update actions on the table. But if we
did, there are no real differences compared to a classical table approach. Updates are quite fast if indexes are used -
otherwise, they can get slow.

#### Deletes

Deletion of entries in the database almost always results in a re-ordering of the B-Tree, which is the default index
strategy for most databases. This costs time, and since we normally do not delete entries in our event store, this will
not impact us when using event sourcing. You could count this as a performance gain compared to the traditional way if
you want.

#### Read

Now to the most interesting part of this section: reading all events for an aggregate. First, reading from the database
is naturally fast. If you ever encounter slow queries, it is often because of multiple joins or missing
needed indexes.

But what can we do if an aggregate lives so long that it accumulates so many events that it's getting slow to load? Even
though it's rarely the case, there are solutions to that problem. Let's dive into two of them.

### Cache the Aggregate: Snapshots

The most commonly heard solution to this problem is probably snapshotting. This technique is a type of caching where the
current state of the aggregate is serialized and saved into a persistent store. It is important that the state is saved
with the current playhead of the aggregate. This snapshot is then loaded, deserialized, and afterward, the events that
occurred after the snapshot creation are applied to the aggregate. For
our [library, the configuration](/docs/event-sourcing/latest/snapshots) can be quickly set up using the
attribute `#[Snapshot]`, which requires the cache pool name. There is also the option to configure how many events
should trigger a cache renewal by specifying a `batch` amount. So in this example after `1000` events a snapshot will be
automatically generated and saved - you don't need to do this yourself.

```php
use Patchlevel\EventSourcing\Aggregate\BasicAggregateRoot;
use Patchlevel\EventSourcing\Attribute\Aggregate;
use Patchlevel\EventSourcing\Attribute\Snapshot;

#[Aggregate('profile')]
#[Snapshot('default', batch: 1000)]
final class Profile extends BasicAggregateRoot
{
    // ...
}
```

As with every cache, there are situations where we need to invalidate it. This happens when we update the aggregate
code. For example, by removing or adding a new property. This is necessary because the serialized aggregate stored
in the cache is no longer in sync with the current class. Cache invalidation can be done by updating the `version`,
which can also be configured through the attribute. This makes it effortless to invalidate the cache during deployment.

```php
use Patchlevel\EventSourcing\Aggregate\BasicAggregateRoot;
use Patchlevel\EventSourcing\Attribute\Aggregate;
use Patchlevel\EventSourcing\Attribute\Snapshot;

#[Aggregate('profile')]
#[Snapshot('default', version: '2')]
final class Profile extends BasicAggregateRoot
{
    // ...
}
```

To be honest, this technique is rarely used in real applications. Why? Loading 10,000 events for an aggregate takes only
50ms in our benchmarks. This is already quite fast, and an aggregate that accumulates so many events in its lifetime is
rare. However, for these cases, you could use the snapshot cache to improve loading times.

:::note
We benchmark every PR via GitHub actions using PostgreSQL to ensure no performance degradation slips through. You can
check an example [here](https://github.com/patchlevel/event-sourcing/pull/644#issuecomment-2464173934).
::::

### Aggregate Lifecycle: Split Stream

There is also a more natural way for an aggregate to reduce its loading time. In most businesses, certain events mark
the beginning of a new cycle, such as a contract renewal. We can utilize these events to shorten the stream for the
aggregate. This is done by marking the event with the `#[SplitStream]` attribute. When this event is recorded and saved,
all past events are marked as archived. This results in loading only the events starting from the one that split the
stream.

```php
use Patchlevel\EventSourcing\Attribute\Event;
use Patchlevel\EventSourcing\Attribute\SplitStream;

#[Event('customer.contract_renewed')]
#[SplitStream]
final class ContractRenewed
{
    public function __construct(
        // contract renewal data
        public CustomerId $customerId,
        public \DateTimeImmutable $until,
        // other aggregate data
        public string $name,
    ) {
    }
}
```

As shown in the code example, some comments indicate that we not only require the event's data but also all the data
from the aggregate at that point. This is logical because the aggregate will now start loading from this event onward.
Therefore, all past information gathered up to the stream split must be present in this event. Otherwise, the aggregate
would operate with incomplete data.

But don't worry - the past events are not lost. They are only marked as `archived`, ensuring the store does not load
them anymore for recreating the aggregate state.

This solution achieves faster loading times by reducing the number of events that need to be loaded. The key difference
from snapshotting is that this approach aligns with business logic. Split stream events are an integral part of the
business and reflect how the business operates with its data. You can read more
about [split stream in our documentation](/docs/event-sourcing/latest/split_stream).

### The Application Side: Object Hydration

The next aspect to consider is the application side, in our case, PHP. Often, we want to represent data as objects to
simplify working with it, and for this purpose, we commonly use an ORM like `doctrine/orm`. What many people don't know
is that these ORMs also perform complex and time-intensive processes: the hydration of data into objects. This process
can become time-consuming, especially in complex structures involving multiple joins. Ocramius has written an excellent
blog post on this topic,
titled [Doctrine ORM Hydration Performance Optimization](https://ocramius.github.io/blog/doctrine-orm-optimization-hydration/).

This does not change when using event sourcing. Here, a hydration step is also needed for the events. However, the
structures involved are typically much simpler, making the hydration process significantly faster and more
straightforward. To further optimize this, we developed a [hydrator](https://github.com/patchlevel/hydrator/) tailored
for this use case. It features a modern and intuitive configuration using `#[Attributes]` and includes built-in GDPR
support,
leveraging [crypto shredding](./mastering-sensitive-data-handling-and-gdpr-compliant-secure-data-removal-with-event-sourcing).

## The Reading Side: Projections

Now, the reading side - the side where most people agree it is inherently more performant and flexible compared to the
traditional ORM-based approach. With event sourcing, we can create highly optimized read models tailored to our specific
needs, offering virtually unlimited possibilities in designing projections. This flexibility is one of the greatest
advantages of event sourcing.

Here is an example of how a projector could look. The purpose of this `projection` is to display the number of guests
currently checked in at different hotels.

```php
use Doctrine\DBAL\Connection;
use Patchlevel\EventSourcing\Aggregate\Uuid;
use Patchlevel\EventSourcing\Attribute\Projector;
use Patchlevel\EventSourcing\Attribute\Setup;
use Patchlevel\EventSourcing\Attribute\Subscribe;
use Patchlevel\EventSourcing\Attribute\Teardown;
use Patchlevel\EventSourcing\Subscription\Subscriber\SubscriberUtil;

#[Projector('hotel')]
final class HotelProjector
{
    use SubscriberUtil;

    public function __construct(
        private readonly Connection $db,
    ) {
    }

    /** @return list<array{id: string, name: string, guests: int}> */
    public function getHotels(): array
    {
        return $this->db->fetchAllAssociative("SELECT id, name, guests FROM {$this->table()};");
    }

    #[Subscribe(HotelCreated::class)]
    public function handleHotelCreated(HotelCreated $event, Uuid $aggregateId): void
    {
        $this->db->insert(
            $this->table(),
            [
                'id' => $aggregateId->toString(),
                'name' => $event->hotelName,
                'guests' => 0,
            ],
        );
    }

    #[Subscribe(GuestIsCheckedIn::class)]
    public function handleGuestIsCheckedIn(Uuid $aggregateId): void
    {
        $this->db->executeStatement(
            "UPDATE {$this->table()} SET guests = guests + 1 WHERE id = ?;",
            [$aggregateId->toString()],
        );
    }

    #[Subscribe(GuestIsCheckedOut::class)]
    public function handleGuestIsCheckedOut(Uuid $aggregateId): void
    {
        $this->db->executeStatement(
            "UPDATE {$this->table()} SET guests = guests - 1 WHERE id = ?;",
            [$aggregateId->toString()],
        );
    }

    #[Setup]
    public function create(): void
    {
        $this->db->executeStatement("CREATE TABLE IF NOT EXISTS {$this->table()} (id VARCHAR PRIMARY KEY, name VARCHAR, guests INTEGER);");
    }

    #[Teardown]
    public function drop(): void
    {
        $this->db->executeStatement("DROP TABLE IF EXISTS {$this->table()};");
    }

    private function table(): string
    {
        return 'projection_' . $this->subscriberId();
    }
}
```

Each method of the projector with a `#[Subscribe]` attribute gets called as soon as an event, to which it is subscribed,
is recorded. With this, we can create a different read model for each use case, all populated by these events.

### Pick the Best Tool for the Job

We can choose the most suitable database for each read model based on its specific requirements. This decision can be
made independently for every read model, providing the flexibility to utilize specialized tools tailored to specific use
cases. All factors can be considered in this decision, including performance, special features, or infrastructure
concerns.

If we later realize that a different tool would be a better fit - for example, switching from MySQL to Elasticsearch for
a read model to improve search capabilities - the migration process is straightforward. We update the projector to
accommodate any necessary changes and deploy it. Data migration happens automatically by reprocessing all events and
applying them to the new projection. This eliminates the need for a dedicated migration script to transfer data from one
storage system to another.

### Normalization & JOIN-less Queries

If we decide that a relational database is the right choice, we can optimize the table structure for performance.
Typically, joins are a common bottleneck in slow database queries, especially when joining multiple tables. Joins are
often necessary because traditional table designs prioritize reducing data redundancy and contextualizing data across
multiple tables.

However, for read models, we don't need to normalize tables in the same way. This allows us to consolidate all necessary
data into a single, denormalized table if desired. The result is join-less queries, which significantly improve query
performance.

If you're concerned that this approach might make things more complicated, don't worry! You can still design your read
models with multiple tables that can be joined together if that suits your needs better. This approach simply adds more
options without taking anything away.

## Conclusion

Event sourcing offers a robust way to handle data by maintaining an immutable log of events. While concerns about
performance - particularly around reading and writing aggregates - are valid when using long-living aggregates, the
techniques discussed in this article demonstrate how those challenges can be mitigated.

- Aggregate Writing Performance: The append-only nature of
  the [event store](/docs/event-sourcing/latest/store) ensures fast writes, particularly in the
  early stages. The absence of complex constraints like foreign keys further enhances this.
- Aggregate Reading Performance: Advanced techniques
  like [snapshots](/docs/event-sourcing/latest/snapshots)
  and [split streams](/docs/event-sourcing/latest/split-stream) help optimize aggregate loading times,
  even for long-lived aggregates with thousands of events.
- Projections: The flexibility of event sourcing shines through in the reading layer.
  With [projections](/docs/event-sourcing/latest/subscription), you can design read-optimized models
  tailored to specific use cases, leveraging the best storage tools for each scenario. The ability
  to rebuild projections from events ensures adaptability without data migration headaches.

By leveraging these strategies, event sourcing not only retains its immutability and traceability benefits but also
remains performant, even for complex or high-volume applications. Whether you're optimizing for aggregate performance or
read-side flexibility, event sourcing provides powerful tools and patterns to meet your needs.

For more detailed guidance, check out our [documentation](/docs/event-sourcing/latest), and feel free
to share your thoughts or questions in the comments or on [GitHub](https://github.com/patchlevel/event-sourcing/)!
