Table of contents
The first time a retailer on Afto called to ask "why did this customer's order get marked delivered before it was shipped?", we had no good answer. The order record showed the final state. The intermediate steps were gone.
We fixed that by moving to event sourcing for the entire order lifecycle—every state transition is a logged, immutable event. Here's how it works and what we learned.
The Order State Machine
An order in Afto moves through a defined set of states:
CART_CREATED
│
▼
CHECKOUT_STARTED
│
├──(payment failed)──► PAYMENT_FAILED
│
▼
PAYMENT_CONFIRMED
│
▼
ORDER_PLACED
│
├──(OOS item)──► PARTIALLY_CANCELLED
│
▼
PROCESSING
│
▼
SHIPPED
│
├──(delivery failed)──► DELIVERY_FAILED
│
▼
DELIVERED
Every arrow is a state transition. Every state transition writes an event.
The Events Table
CREATE TABLE order_events (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
order_id UUID NOT NULL REFERENCES orders(id),
retailer_id UUID NOT NULL,
customer_id UUID,
event_type TEXT NOT NULL,
payload JSONB NOT NULL DEFAULT '{}',
actor TEXT NOT NULL, -- 'customer', 'system', 'agent', 'admin'
actor_id UUID,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ip_address TEXT,
user_agent TEXT
);
CREATE INDEX idx_order_events_order_id ON order_events(order_id, created_at);
CREATE INDEX idx_order_events_retailer_id ON order_events(retailer_id, created_at DESC);The payload JSONB column carries event-specific data. For ITEM_ADDED, it holds product_id, quantity, unit_price. For PAYMENT_CONFIRMED, it holds transaction_id, gateway, amount. The schema is loose by design—we don't want a migration every time we add context to an event type.
Emitting Events
Every operation that modifies order state goes through an OrderService that always writes an event alongside the state update:
class OrderService {
async addToCart(
customerId: string,
productId: string,
quantity: number,
requestMeta: RequestMeta
): Promise<Cart> {
const [cart] = await db.transaction(async (trx) => {
// 1. Update the cart record
const updatedCart = await trx.query(
`INSERT INTO carts (customer_id, retailer_id)
VALUES ($1, $2)
ON CONFLICT (customer_id, retailer_id, status)
DO UPDATE SET updated_at = NOW()
RETURNING *`,
[customerId, requestMeta.retailerId]
);
await trx.query(
`INSERT INTO cart_items (cart_id, product_id, quantity, unit_price)
VALUES ($1, $2, $3, (SELECT price FROM products WHERE id = $2))
ON CONFLICT (cart_id, product_id)
DO UPDATE SET quantity = cart_items.quantity + EXCLUDED.quantity`,
[updatedCart.rows[0].id, productId, quantity]
);
// 2. Write the event atomically with the state change
await trx.query(
`INSERT INTO order_events
(order_id, retailer_id, customer_id, event_type, payload, actor, actor_id, ip_address)
VALUES ($1, $2, $3, 'ITEM_ADDED', $4, 'customer', $3, $5)`,
[
updatedCart.rows[0].id,
requestMeta.retailerId,
customerId,
JSON.stringify({ product_id: productId, quantity }),
requestMeta.ipAddress,
]
);
return updatedCart.rows;
});
return cart;
}
}The database transaction ensures the event and the state change are atomic. If the transaction fails, neither the cart update nor the event are written. This was not the case in our original design, where we wrote events after the main operation—leading to orphaned events and missing events in equal measure.
Event Replay for Debugging
The most valuable capability event sourcing gave us was the ability to reconstruct the exact sequence of events for any order. When a retailer reports an issue, we pull the event stream:
async function getOrderTimeline(orderId: string) {
const events = await db.query(
`SELECT event_type, payload, actor, created_at, ip_address
FROM order_events
WHERE order_id = $1
ORDER BY created_at ASC`,
[orderId]
);
return events.rows.map((e) => ({
type: e.event_type,
at: e.created_at,
by: e.actor,
details: e.payload,
ip: e.ip_address,
}));
}This produces a readable audit trail like:
10:03:21 — CART_CREATED (customer)
10:03:45 — ITEM_ADDED { product: "Blue T-Shirt", qty: 2 } (customer)
10:04:12 — CHECKOUT_STARTED (customer)
10:04:18 — PAYMENT_CONFIRMED { gateway: "stripe", txn: "pi_xxx" } (system)
10:04:19 — ORDER_PLACED (system)
10:06:44 — STATUS_UPDATED { to: "processing" } (system)
10:42:00 — SHIPPED { tracking: "BD1234567IN" } (admin:retailer_123)
15:30:00 — DELIVERED (system via courier webhook)
We've resolved dozens of support tickets with this view in under two minutes.
Analytics: Funnel and Drop-Off
Beyond debugging, the events table powers our conversion funnel analysis:
-- Cart-to-order conversion rate by retailer, last 30 days
SELECT
retailer_id,
COUNT(DISTINCT CASE WHEN event_type = 'CART_CREATED' THEN order_id END) AS carts,
COUNT(DISTINCT CASE WHEN event_type = 'ORDER_PLACED' THEN order_id END) AS orders,
ROUND(
COUNT(DISTINCT CASE WHEN event_type = 'ORDER_PLACED' THEN order_id END)::numeric
/ NULLIF(COUNT(DISTINCT CASE WHEN event_type = 'CART_CREATED' THEN order_id END), 0) * 100,
1
) AS conversion_pct
FROM order_events
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY retailer_id
ORDER BY conversion_pct;This query was previously a multi-join across four tables. With the events table, it's a single aggregation.
Retention and Archival
The events table grows continuously. We set a retention policy:
- Hot tier: events from the last 90 days in PostgreSQL (fast queries)
- Cold tier: events older than 90 days archived to S3 as Parquet files
- Legal hold: payment-related events retained for 7 years regardless
async function archiveOldEvents() {
const cutoff = subDays(new Date(), 90);
const oldEvents = await db.query(
`DELETE FROM order_events
WHERE created_at < $1
AND event_type NOT IN ('PAYMENT_CONFIRMED', 'REFUND_ISSUED')
RETURNING *`,
[cutoff]
);
await writeToS3Parquet(oldEvents.rows, `events/archive/${formatDate(cutoff)}.parquet`);
}Key Takeaways
- Write events inside the same database transaction as the state change—separated writes lead to inconsistency under failure.
- The
payloadJSONB column is intentionally loose—event-specific structure lives in the payload, not in separate columns you'll be migrating forever. - Event sourcing's biggest win isn't rebuilding state; it's debuggability—a readable timeline resolves support tickets instantly.
- Index on
(order_id, created_at)and(retailer_id, created_at DESC)—these are the two most common access patterns. - Plan archival from day one—events are high-volume; the table will grow faster than any other in your system.