Design and Architecture
1. Architectural Principles
The service is implemented with Hexagonal Architecture, CQRS, and DDD aggregate modeling.
- Interface layer (
interfaces/http): transport contracts, validation, error mapping - Application layer (
application/*): command/query orchestration, ports, transactional boundaries - Domain layer (
domain/order/*): aggregate behavior and state transition invariants - Infrastructure layer (
infrastructure/*): persistence, messaging, cache, security, resilience, crosscutting controls
CQRS split:
- Write side:
OrderService - Read side:
OrderQueryService
2. Core Design Decisions
2.1 Transactional Outbox over direct Kafka publish
Reason: commit order state and integration event atomically from the write transaction perspective.
- Order write and outbox row are persisted in one DB transaction
- Kafka publication is handled asynchronously by outbox workers
- Prevents โDB committed, broker send failedโ inconsistency
- Worker claim semantics use explicit outbox leasing (
IN_FLIGHT) with lease-expiry recovery to avoid duplicate concurrent publishers
2.2 State pattern for order lifecycle
Reason: enforce legal transitions in domain code, not controller/service condition chains.
Orderdelegates behavior toOrderState- Concrete states:
PendingState,ProcessingState,ShippedState,DeliveredState,CancelledState - Illegal transitions fail with
ConflictException
2.3 Global idempotency lifecycle states
Reason: avoid false completion and data loss during partial failures.
IN_PROGRESS: request accepted, business write not yet finalizedCOMPLETED: order commit succeeded and key is mapped toorderId- completion is marked after commit via transaction synchronization
2.4 Active/passive write gating
Reason: prevent unsafe writes during regional dependency instability.
RegionalFailoverManagermonitors DB/Redis/Kafka health- sustained unhealthy thresholds move region to passive
- write APIs are gated via
allowsWrites()
2.5 Transaction manager disambiguation
Reason: transactional Kafka producer introduces additional transaction manager beans and can make Spring transaction selection ambiguous.
- DB service methods explicitly use
transactionManager - avoids accidental routing of JPA operations through non-JPA transaction managers
2.6 Scheduled promotion vs Kafka consumer
Reason: separate integration (event log + dedupe) from business scheduling (when pending work becomes processing).
PendingToProcessingScheduler:@Scheduledfixed rate; callsOrderService.promotePendingOrdersScheduled(); loadsPENDINGrows in bounded pages (app.scheduling.pending-promotion-batch-size) and promotes one transaction per order with the same regional checks as writes (repeated page-0 reads until no pending remain).OrderCreatedConsumer: validates payload, applies regional gate for acknowledgment path, writesprocessed_events; does not change lifecycle toPROCESSING.
2.7 Resource-level authorization (reads and cancel)
Reason: role checks on URLs are insufficient; data must be scoped by owner.
- Create binds JWT
subtoowner_subject. - Query uses owner-scoped repository methods and cache keys for non-admins; admins use global queries and admin cache keys.
- Cancel enforces owner match or
ROLE_ADMIN(see Security and Authorization).
3. Consistency Model
Strong consistency paths
- application write contracts use persistence-neutral
OrderRecord OrderEntityremains infrastructure-only and is mapped in repository adapters- optimistic locking on updates/cancel (
@Version) - idempotent create semantics with durable key lifecycle
Eventual consistency paths
- Automatic
PENDINGโPROCESSING:PendingToProcessingSchedulerruns onapp.scheduling.pending-to-processing-ms(not the Kafka consumer; the consumer only recordsORDER_CREATEDinprocessed_events). - Outbox retries/backoff for asynchronous publication of
ORDER_CREATEDto Kafka - Kafka consumer retry topics and DLT for consume-side failure isolation (dedupe markers; no aggregate promotion in the consumer)
Delivery contract (explicit)
- producer path is transactional Kafka publish + transactional outbox, not distributed XA across DB and Kafka
- end-to-end delivery guarantee is at-least-once with idempotent consumption
- ordering guarantee is per partition/per order key, not global total ordering across all orders
- active-active conflict handling is version/timestamp/region precedence, not causal consistency
- stale outbox workers are fenced by
leaseOwner+leaseVersionchecks on status transitions (IN_FLIGHT->SENT/FAILED)
4. Write Path Architecture
4.1 Create order flow
- Validate request and authorize endpoint
- Check regional write mode
- Resolve idempotency state
- Persist order + outbox event in one transaction
- Return response
- Mark idempotency key
COMPLETEDafter commit
4.2 Update and cancel flow
- retrieve aggregate snapshot
- apply state transition/cancel behavior in domain model
- persist with optimistic version checks
- evict query cache entries
4.3 Bounded read flow
GET /ordersexecutes through a bounded repository page (app.query.list-max-rows) to prevent unbounded heap growthGET /orders/pageprovides bounded page reads (page,size) with status filter support; server caps page size- Authorization: non-admin callers receive only rows where
owner_subjectmatches their JWTsub; admins see all (see Security and Authorization)
4.4 Order Creation Idempotency Flow (Detailed)
The create path is intentionally conservative because the system accepts retries from clients, API gateways, and load balancers.
Detailed sequence:
- Normalize
X-Idempotency-Key(trim, empty -> null). - Resolve global idempotency state from Redis.
- If state is
COMPLETEDand anorderIdexists, load and return the persisted order. - If state is
IN_PROGRESS, reject with conflict to prevent duplicate concurrent writes. - Mark key
IN_PROGRESS(compare-and-set semantics in Redis). - Perform a second local dedupe lookup by idempotency key in the DB.
- Build domain aggregate from DTO, apply domain invariants, and stamp regional metadata.
- Persist order snapshot in DB transaction.
- Serialize and enqueue outbox event in the same DB transaction.
- Commit DB transaction.
- After commit callback marks global key
COMPLETED->orderId. - Return API response (new or deduped existing order).
flowchart TD
A[POST /orders] --> AW[assertWriteAllowed: allowsWrites + not shouldRejectWrites]
AW --> B[Normalize idempotency key]
B --> C{Key present?}
C -- No --> N1[Create aggregate]
C -- Yes --> D[Resolve Redis idempotency state]
D --> E{State}
E -- COMPLETED --> F[Load order by orderId and return]
E -- IN_PROGRESS --> G[409 conflict]
E -- ABSENT --> H[Mark IN_PROGRESS]
H --> I{CAS success?}
I -- No --> G
I -- Yes --> J[DB lookup by idempotency key]
J --> K{Exists?}
K -- Yes --> L[Return existing order]
K -- No --> N1
N1 --> N2[DB write order]
N2 --> N3[DB write outbox row]
N3 --> N4[Commit DB txn]
N4 --> N5[After-commit mark COMPLETED in Redis]
N5 --> N6[Return 201 Created]
10. Principal Hardening (P0/P1/P2)
P0 reliability hardening (implemented)
- outbox publish path is now completion-aware: partition workers wait for async broker completions
- outbox publish concurrency is explicitly bounded by
app.outbox.publisher.max-in-flight - consumer flow no longer blocks listener threads with
Thread.sleep; retries are delegated to retry-topic orchestration - lease fencing uses both owner and version tokens to reject stale callbacks after reclaim
P1 scalability hardening (implemented)
- list APIs are bounded by design (
/orders/pageand bounded fallback for/orders) - cache/list semantics remain backward compatible while protecting against high-cardinality full scans
P2 operational hardening (documentation + observability)
- explicit chaos/load test scenarios are now documented and tied to measurable SLO signals
- runtime dashboards map directly to outbox reclaim safety, DLQ drift, and backpressure transitions
5. Outbox Publication Pipeline
Components:
OutboxPublisher: scheduler + backpressure + worker orchestrationOutboxFetcher: partition-aware due row claimOutboxProcessor: async publish + success transitionOutboxRetryHandler: retry policy and next-attempt scheduling
Behavioral properties:
- bounded in-flight workers
- bounded batch claim size
- deterministic row ordering within processing batch
- atomic lease transition (
PENDING/FAILED->IN_FLIGHT) before worker release - lease timeout recovery for crashed workers
- at-least-once delivery with consumer dedupe
Detailed pipeline:
- Scheduler selects only partitions owned by current instance.
- Worker permit is acquired (bounded concurrency).
- Fetcher claims due rows with
FOR UPDATE SKIP LOCKEDsemantics. - Claimed rows are transitioned to
IN_FLIGHTand assignedleaseOwner + leaseVersion. - Processor starts async publish per row (bounded in-flight publish semaphore).
- Success callback attempts
markSentIfLeased(id, owner, version, ...). - If fenced update fails, callback is stale and safely ignored.
- Failure callback runs adaptive retry policy and persists
FAILED + nextAttemptAtwith lease fencing. - Worker releases permit only after batch completion future resolves.
flowchart TD
P1[OutboxPublisher poll] --> P2[Acquire worker permit]
P2 --> P3[Claim partition batch]
P3 --> P4[Set IN_FLIGHT + leaseOwner + leaseVersion]
P4 --> P5[Async publish]
P5 --> P6{Publish result}
P6 -- Success --> P7[Fenced markSentIfLeased]
P6 -- Failure --> P8[Adaptive retry plan]
P8 --> P9[Fenced markFailedIfLeased]
P7 --> P10[Release permit after completion]
P9 --> P10
5.1 Lease Fencing and Stale Worker Protection
- Lease owner is unique per claim cycle and partition worker.
- Lease version increments on every claim.
- Success/failure transitions require both owner and version match.
- Any delayed callback from an expired lease cannot mutate the row.
- This is the core anti-duplication guarantee for asynchronous publication races.
5.2 Eventual Consistency Contract
- Command write commit and outbox insert are strongly consistent within one DB transaction.
- Kafka publication is asynchronous and eventually consistent.
- Consumer dedup markers enforce idempotent state application.
- End-to-end contract is at-least-once delivery with idempotent consume.
5.3 Regional Failover Decision Tree
flowchart TD
R1[Periodic health probe] --> R2[Check DB]
R2 --> R3[Check Redis]
R3 --> R4[Check Kafka]
R4 --> R5{All healthy?}
R5 -- Yes --> R6[Increment healthy streak]
R5 -- No --> R7[Increment unhealthy streak]
R7 --> R8{Unhealthy threshold crossed?}
R8 -- Yes --> R9[Switch to PASSIVE]
R8 -- No --> R10[Remain current mode]
R6 --> R11{Recovery threshold crossed?}
R11 -- Yes --> R12[Switch to ACTIVE]
R11 -- No --> R10
R9 --> R13[Write gating enabled]
R12 --> R14[Write gating disabled]
5.4 Backpressure Propagation and Adaptive Throttling
flowchart LR
B1[kafka.consumer.lag.ms] --> BM[BackpressureManager]
B2[outbox.pending + outbox.failed] --> BM
B3[db.saturation] --> BM
BM --> B4{BackpressureManager.Level}
B4 -->|NORMAL| B0[throttlingFactor 1.0; writes allowed]
B4 -->|ELEVATED| B5[Tighten rate limits; lower throttlingFactor lengthens AdaptiveRetryPolicyStrategy delays]
B4 -->|CRITICAL| B7[shouldRejectWrites + lowest throttlingFactor]
5.5 Cross-Region HLC Conflict Resolution Flow
Hybrid Logical Clock (HLC) hardening replaces pure wall-clock ordering in conflict resolution.
Each candidate update is evaluated as a tuple: (physicalMillis, logicalCounter).
Winning rule:
- incoming wins when
incoming.physicalMillis > current.physicalMillis - incoming wins when physical values are equal and
incoming.logicalCounter > current.logicalCounter - ties are broken deterministically by region id to prevent non-deterministic outcomes
This avoids false overwrite decisions caused by NTP drift when regions disagree on wall clock by small offsets.
flowchart TD
H1[Incoming regional update] --> H2[Build incoming HLC from occurredAt + version]
H2 --> H3[Build current HLC from persisted lastUpdatedTimestamp + version]
H3 --> H4{incoming HLC > current HLC?}
H4 -- Yes --> H5[Apply update]
H4 -- No --> H6{incoming HLC == current HLC?}
H6 -- No --> H7[Reject as stale]
H6 -- Yes --> H8[Region-id tie breaker]
H8 --> H9{Incoming region >= current region?}
H9 -- Yes --> H5
H9 -- No --> H7
6. Kafka Consume and Dedupe Architecture
OrderCreatedConsumer provides:
- schema validation/deserialization
- transactional dedupe marker write (automatic
PENDINGโPROCESSINGis a separate scheduled job, not the consumer) - retry topic routing for transient conditions
- DLT handling with structured diagnostics
processed_events table is the idempotent-consume anchor.
7. Redis Usage Model
Query cache
- adapter:
RedisCacheProvider - cache-aside read strategy
- soft-fail fallback to DB
- local circuit-breaker behavior during repeated Redis faults
- TTL jitter for synchronized-expiry reduction
Rate limiting
- filter:
RateLimitingFilter - token bucket in Lua for atomic refill/consume
- distributed key scope (
user:path:ip) - fail-open during Redis degradation to preserve availability
8. Security Model
- resource server JWT validation
- role claim conversion to Spring authorities
- endpoint-level access controls in
SecurityConfig - normalized API error envelope with correlation id
9. Failure Matrix
| Failure mode | Defensive mechanism | Observable outcome |
|---|---|---|
| create transaction fails | transactional rollback | no order, no outbox row |
| broker unavailable | outbox lease + adaptive retries/backoff | pending/in-flight/failed backlog grows then drains |
| duplicate Kafka delivery | processed-event dedupe | idempotent consume, no repeated transition |
| Redis cache failure | fail-soft cache adapter | DB fallback, increased cache error metrics |
| rate limiter Redis failure | fail-open policy | API remains available |
| stale concurrent write | optimistic locking | conflict response |
| prolonged dependency outage | active/passive failover | writes rejected in passive region |