A paused subscription receives a customer.subscription.updated webhook from Stripe. The payload says the new status is past_due. The local state machine has eight states and fifteen valid transitions, defined as a single Elixir map literal. paused can transition to active. That is the only outbound edge. There is no paused to past_due entry. The state machine says this transition doesn't exist.
What do you do with the event?
Rejecting the event is the safe choice. Invalid transition, log a warning, drop the payload. Clean, principled, and wrong. Stripe doesn't send a status change in isolation. The same webhook payload contains updated period dates, metadata changes, and potentially a new plan assignment. Rejecting the event to protect the status field means losing everything else in the payload.
Forcing the transition is the pragmatic choice. Override the state machine, accept whatever Stripe says, move on. Also wrong. The state machine exists because downstream business logic depends on it. The dunning engine creates retry sequences when a subscription enters past_due. If a paused subscription suddenly appears as past_due through a forced transition, the dunning engine starts chasing a payment that was intentionally paused. The customer gets escalating "your payment failed" notifications for a subscription they paused on purpose.
But the invalid transition is a symptom, not the disease. Stripe and the local data model can disagree about the current state of a subscription, and neither is wrong. Stripe is the source of truth for payment processing. The local state machine is the source of truth for business logic execution. When they conflict, you need an architecture that handles the disagreement without losing data or breaking invariants.
Two Sources of Truth, Zero Arbitration
This isn't a sync problem. Sync implies one system is behind and needs to catch up. This is a disagreement: Stripe says the state is X, the local model says the transition to X is illegal, and both positions are defensible.
Stripe can put a subscription into states the local model doesn't allow because Stripe's state machine is broader and serves a different purpose. Stripe tracks billing states across all possible payment scenarios, including edge cases around payment method updates, invoice finalization timing, and subscription schedule modifications. The local model tracks lifecycle states for business logic: when to start dunning, when to compute churn, when to notify the customer. These are overlapping but non-identical concerns.
The moment I accepted that these two state machines would diverge, the design became clear. The system needed to handle three cases: agreement (both say the same thing, proceed normally), valid disagreement (the transition is in the map, accept it), and invalid disagreement (the transition is not in the map, accept the data but not the transition).
The Strip, Don't Reject Pattern
The core of this design lives in a single function: maybe_strip_status/3 in the SubscriptionProcessor module.
defp maybe_strip_status(stripe_data, nil, _new_status), do: stripe_data
defp maybe_strip_status(stripe_data, prev, new) when prev == new, do: stripe_data
defp maybe_strip_status(stripe_data, previous_status, new_status) do
case StateMachine.transition!(previous_status, new_status) do
:ok ->
stripe_data
{:error, :invalid_transition} ->
Logger.warning(
"SubscriptionProcessor: invalid transition #{previous_status} -> #{new_status}, " <>
"keeping current status"
)
Map.put(stripe_data, "status", previous_status)
end
end
When the state machine rejects a transition, the function does not reject the event. It replaces the incoming status with the previous status and returns the rest of the payload untouched. Period dates, trial dates, metadata, plan references: all of it passes through. Only the illegal status change gets stripped.
The first two guard clauses handle the cold-start cases. When previous_status is nil (brand new subscription, no prior state to compare), accept whatever Stripe sends. When the previous and new statuses are identical (Stripe sent an update that didn't change the status), pass through without checking the transition map.
This is a deliberate tradeoff. The local status can drift from Stripe's status. A subscription might be past_due in Stripe's records and paused in the local database. I accepted this inconsistency because the local model controls what actually happens: which dunning sequences fire, which notifications send, which metrics count the subscription as churned. If Stripe's status is wrong from the local model's perspective, the local model wins for execution and Stripe wins for billing. Nobody is the universal authority.
Idempotency Under Disagreement
The state disagreement compounds with another problem: event ordering. Stripe doesn't guarantee webhook delivery order. A customer.subscription.updated event from 3:00 PM can arrive after one from 3:05 PM. If the 3:05 PM event was already processed and moved the subscription to active, the 3:00 PM event now carries a stale status that the state machine might reject.
The idempotency model handles this with a three-state check, not the typical two-state (seen/unseen) approach. Every event is stored with a composite key: tenant_id:stripe_event_id. Before processing, the system checks:
- New: No record exists. Process normally.
- Duplicate: A record exists with
processed_atset. Skip entirely, return success. - Processing: A record exists without
processed_at. Another Oban worker is handling this event right now. Skip to avoid double-processing.
The third state is the one most implementations miss. Without it, two workers pulling the same event from the Oban queue would both attempt to process it, and both would try to create downstream records (dunning attempts, notifications). The database-level unique constraint on the idempotency key catches this at the persistence layer, but the three-state check catches it earlier and avoids the exception entirely.
The Dunning Guard Nobody Asks About
The paused-to-past-due edge case has a second layer of protection deeper in the processing pipeline. Even if the status stripping somehow failed and a paused subscription appeared as past_due, the dunning trigger has its own guard:
if prev_status == "paused" do
Logger.warning(
"SubscriptionProcessor: paused subscription transitioned to past_due, " <>
"skipping dunning for sub #{subscription.id}"
)
end
This is belt-and-suspenders engineering for a specific Stripe behavior I discovered during development. Stripe can emit a past_due status for a paused subscription when a pending invoice from before the pause fails to collect. The subscription was paused intentionally, but Stripe's billing engine still tries to finalize the outstanding invoice. If it fails, Stripe marks the subscription past_due even though the customer explicitly paused it.
Without this guard, the dunning engine would start a 7-day escalation sequence (email at day 1, email at day 3, Telegram at day 5, both channels at day 7, then automatic cancellation) for a subscription the customer paused. The customer experience would be: "I paused my subscription. Three days later I got a Telegram message saying my payment failed and I need to act immediately." That is not a recoverable customer relationship.
What I Would Redesign
The status stripping approach works, but it has a gap: the system does not reconcile. If the local status diverges from Stripe's status, it stays diverged until the next valid transition arrives. A reconciliation job that periodically fetches subscription status from the Stripe API and compares it against local state would close this gap. I didn't build it because the current system is event-driven by design (no polling), and adding a reconciliation poller would create a second source of state mutations that the webhook pipeline doesn't expect.
The churn calculator has a related bootstrap problem. It computes churn rate as churned_count / active_count_at_period_start, where the denominator comes from the previous day's metrics snapshot. On the first day of operation, there is no previous snapshot. The first churn rate is always 0.0000, regardless of what actually happened. I accepted this because building temporal query infrastructure to count active subscriptions at an arbitrary past timestamp added complexity that the 99.9% case (day 2 and beyond) doesn't need. But the first day's metrics are wrong, and there is no warning in the dashboard about it.
The Lesson Is About Modeling, Not Stripe
Stripe is the specific case, but the pattern applies to any system that processes events from an external authority. Payment providers, shipping APIs, identity providers, government reporting systems: they all emit state changes according to their own transition rules. Your internal model is a subset of their model, tuned for your business logic, and the two will disagree.
The instinct is to treat the external system as the single source of truth. Let Stripe win every argument. The problem is that "letting Stripe win" means your business logic executes against a state model you do not control and cannot predict. The alternative isn't to fight the external system. It is to separate the concerns: accept their data, enforce your transitions, log the disagreements, and build guards at every point where the disagreement could trigger unintended side effects.
Build for disagreement, not consensus. The external world does not know your rules, and it does not care.