**Event Sourcing in a Nutshell**
Event sourcing is a powerful approach that gives us total control and transparency over our data. For example, if we want to know how much money a particular user had two months ago at 1:42 PM, we can simply query the relevant transactions and sum them up. This approach ensures we know everything about the history of events, which is crucial for compliance.
Accounting systems do something similar, though they don’t call it event sourcing 🙂
—
### Advantages of Event Sourcing
Event sourcing offers several unique benefits:
– **Rebuild State:** You can discard the current application state and rebuild it at any time using the complete history of events.
– **Event Replay:** If a past event was incorrect, you can adjust it, replay events, and rebuild the application state accordingly.
– **Event Replay for Ordering:** Events may arrive out of sequence, especially in asynchronous messaging systems. You can replay them in the correct order to restore the accurate state.
—
### Alternatives to Event Sourcing
While event sourcing solves auditability and compliance challenges effectively, there are alternatives:
1. **Audit Log Pattern:** Keep your current state tables but add comprehensive audit logs that track all changes. This method is simpler but lacks the business intent detail that event sourcing provides.
2. **Change Data Capture:** Capturing changes at the database level. For example:
“`sql
INSERT INTO events (account_id, type, amount, timestamp) VALUES (123, ‘deposit’, 100, NOW());
“`
—
### Event Sourcing Write Advantages
Event sourcing also improves write performance due to its append-only nature:
– **Write Performance:** Append-only writes are much faster than updates.
– **No Lock Contention:** Multiple transactions can write simultaneously without waiting.
– **Better Concurrency:** No need to lock rows during balance updates.
– **Optimized for SSDs:** Sequential writes perform excellently on modern storage devices.
—
### Sample Query: Querying the Balance at a Point in Time
**Regulatory Question:**
*What was Account X’s balance on Date Y at Time Z?*
**Event Sourcing Answer:**
“`sql
SELECT SUM(amount)
FROM events
WHERE account_id = X
AND timestamp <= (SELECT MAX(timestamp) FROM events WHERE account_id = X AND timestamp <= 'Date Y Time Z');
```
To optimize this, we also use snapshots:
```sql
SELECT snapshot.balance +
(SELECT SUM(amount) FROM events
WHERE account_id = 123
AND event_sequence > snapshot.last_event_sequence) AS current_balance
FROM account_snapshots AS snapshot
WHERE snapshot.account_id = 123
AND snapshot.last_event_sequence = (SELECT MAX(last_event_sequence) FROM account_snapshots WHERE account_id = 123);
“`
This approach reduced balance calculation time from 2-5 seconds to 50-200 milliseconds for active accounts.
—
### Storage Growth & Cost Management
Events accumulate rapidly, so we implemented tiered storage:
– **Hot Storage (Azure Premium SSD):** Last 3 months (~2 TB)
– **Warm Storage (Azure Standard SSD):** 3-12 months (~5 TB)
– **Cold Storage (Azure Archive):** 1+ years (~50 TB)
**Cost Comparison:**
Total storage costs are around $800/month using this tiered approach, compared to approximately $15,000/month if everything were stored on premium SSD.
—
### Tradeoffs with Event Sourcing
Event sourcing is not without challenges:
– **Complexity:** It adds significant complexity and requires team training.
– **Query Complexity:** Getting the current state involves aggregation queries, which can be slower than direct reads.
Example query for current balance:
“`sql
SELECT account_id, SUM(amount) AS current_balance
FROM events
WHERE account_id = 123
GROUP BY account_id;
“`
Versus a traditional query:
“`sql
SELECT balance FROM accounts WHERE id = 123;
“`
– **Storage Growth:** Managing the growing volume of events requires careful strategies.
—
### Why We Rejected Alternatives
Despite its read performance tradeoffs, event sourcing was our choice because it is vastly superior for auditability — a priority for our customer. We were also able to address performance concerns effectively.
—
### CQRS (Command Query Responsibility Segregation) Deep Dive
Important note: Implementing CQRS with multiple databases introduces complexity and eventual consistency. We initially avoided CQRS but kept it as an option.
Later, we created a proof of concept (POC) in the Portfolio Service, showing:
– Report generation time reduced from 30 seconds to 10 seconds.
– Dashboard load time improved from 1 second to 400 milliseconds.
– Complex query performance improved by about 2x.
The results convinced us to implement CQRS selectively.
—
### CQRS Implementation Details
We implemented CQRS in the Transaction-Portfolio Service with:
– **Write Side:** PostgreSQL database for commands.
– **Read Side:** MongoDB document store for queries.
We chose MongoDB because it offers flexible schema designs and high read throughput.
**Workflow:**
A write request goes to PostgreSQL and emits events to Azure Service Bus. Another service instance processes these events and updates MongoDB with denormalized data optimized for reads.
We accepted eventual consistency with synchronization delays typically between 100–500 ms.
—
### Why Was CQRS Faster?
– **Denormalized Read Models:** Avoid complex JOINs by using precomputed aggregations.
– **Optimized Indexes:** Tailored indexes on MongoDB collections.
– **Separate Scaling:** Read replicas can scale independently of the write database.
—
### Example: Portfolio Performance Report
**Traditional approach:**
Complex SQL query with multiple JOINs and aggregations, taking up to 30 seconds for active users:
“`sql
SELECT u.name, p.symbol, SUM(t.quantity) AS total_shares, AVG(t.price) AS avg_price
FROM users u
JOIN portfolios p ON u.id = p.user_id
JOIN transactions t ON p.id = t.portfolio_id
WHERE u.id = 123
GROUP BY u.name, p.symbol;
“`
**CQRS approach:**
Simple document lookup from precomputed projection:
“`js
db.portfolio_summaries.findOne({ user_id: 123 });
“`
This query is fast and efficient.
—
### Major Challenges and Solutions
1. **Debugging Distributed Systems:**
This was initially our biggest challenge. Tracing errors across multiple services and asynchronous queues was complicated.
**Solution:** Implemented distributed tracing using correlation IDs across all services and messages. We used Jaeger for tracing and structured logging for consistent log entries.
2. **Testing Complexity:**
Testing event sourcing and CQRS is fundamentally different.
**Solution:** Created integration environments to replay production events against test instances and invested heavily in property-based testing to ensure event sequences produce valid states.
—
### What Would I Change?
I am convinced this architecture was the right choice for our specific requirements. However, there are definitely areas I would approach differently next time to further improve maintainability, simplify some complexities, and optimize performance.
—
This summary outlines why event sourcing combined with selective CQRS implementation was the optimal solution for our auditing, performance, and scalability needs while balancing complexity and cost.
https://lukasniessen.medium.com/this-is-a-detailed-breakdown-of-a-fintech-project-from-my-consulting-career-9ec61603709c