Design and Architecture

1. Architectural Principles

The service is implemented with Hexagonal Architecture, CQRS, and DDD aggregate modeling.

Interface layer (interfaces/http): transport contracts, validation, error mapping
Application layer (application/*): command/query orchestration, ports, transactional boundaries
Domain layer (domain/order/*): aggregate behavior and state transition invariants
Infrastructure layer (infrastructure/*): persistence, messaging, cache, security, resilience, crosscutting controls

CQRS split:

Write side: OrderService
Read side: OrderQueryService

2. Core Design Decisions

2.1 Transactional Outbox over direct Kafka publish

Reason: commit order state and integration event atomically from the write transaction perspective.

Order write and outbox row are persisted in one DB transaction
Kafka publication is handled asynchronously by outbox workers
Prevents “DB committed, broker send failed” inconsistency
Worker claim semantics use explicit outbox leasing (IN_FLIGHT) with lease-expiry recovery to avoid duplicate concurrent publishers

2.2 State pattern for order lifecycle

Reason: enforce legal transitions in domain code, not controller/service condition chains.

Order delegates behavior to OrderState
Concrete states: PendingState, ProcessingState, ShippedState, DeliveredState, CancelledState
Illegal transitions fail with ConflictException

2.3 Global idempotency lifecycle states

Reason: avoid false completion and data loss during partial failures.

IN_PROGRESS: request accepted, business write not yet finalized
COMPLETED: order commit succeeded and key is mapped to orderId
completion is marked after commit via transaction synchronization

2.4 Active/passive write gating

Reason: prevent unsafe writes during regional dependency instability.

RegionalFailoverManager monitors DB/Redis/Kafka health
sustained unhealthy thresholds move region to passive
write APIs are gated via allowsWrites()

2.5 Transaction manager disambiguation

Reason: transactional Kafka producer introduces additional transaction manager beans and can make Spring transaction selection ambiguous.

DB service methods explicitly use transactionManager
avoids accidental routing of JPA operations through non-JPA transaction managers

2.6 Scheduled promotion vs Kafka consumer

Reason: separate integration (event log + dedupe) from business scheduling (when pending work becomes processing).

PendingToProcessingScheduler: @Scheduled fixed rate; calls OrderService.promotePendingOrdersScheduled(); loads PENDING rows in bounded pages (app.scheduling.pending-promotion-batch-size) and promotes one transaction per order with the same regional checks as writes (repeated page-0 reads until no pending remain).
OrderCreatedConsumer: validates payload, applies regional gate for acknowledgment path, writes processed_events; does not change lifecycle to PROCESSING.

2.7 Resource-level authorization (reads and cancel)

Reason: role checks on URLs are insufficient; data must be scoped by owner.

Create binds JWT sub to owner_subject.
Query uses owner-scoped repository methods and cache keys for non-admins; admins use global queries and admin cache keys.
Cancel enforces owner match or ROLE_ADMIN (see Security and Authorization).

3. Consistency Model

Strong consistency paths

application write contracts use persistence-neutral OrderRecord
OrderEntity remains infrastructure-only and is mapped in repository adapters
optimistic locking on updates/cancel (@Version)
idempotent create semantics with durable key lifecycle

Eventual consistency paths

Automatic PENDING → PROCESSING: PendingToProcessingScheduler runs on app.scheduling.pending-to-processing-ms (not the Kafka consumer; the consumer only records ORDER_CREATED in processed_events).
Outbox retries/backoff for asynchronous publication of ORDER_CREATED to Kafka
Kafka consumer retry topics and DLT for consume-side failure isolation (dedupe markers; no aggregate promotion in the consumer)

Delivery contract (explicit)

producer path is transactional Kafka publish + transactional outbox, not distributed XA across DB and Kafka
end-to-end delivery guarantee is at-least-once with idempotent consumption
ordering guarantee is per partition/per order key, not global total ordering across all orders
active-active conflict handling is version/timestamp/region precedence, not causal consistency
stale outbox workers are fenced by leaseOwner + leaseVersion checks on status transitions (IN_FLIGHT -> SENT/FAILED)

4. Write Path Architecture

4.1 Create order flow

Validate request and authorize endpoint
Check regional write mode
Resolve idempotency state
Persist order + outbox event in one transaction
Return response
Mark idempotency key COMPLETED after commit

4.2 Update and cancel flow

retrieve aggregate snapshot
apply state transition/cancel behavior in domain model
persist with optimistic version checks
evict query cache entries

4.3 Bounded read flow

GET /orders executes through a bounded repository page (app.query.list-max-rows) to prevent unbounded heap growth
GET /orders/page provides bounded page reads (page, size) with status filter support; server caps page size
Authorization: non-admin callers receive only rows where owner_subject matches their JWT sub; admins see all (see Security and Authorization)

4.4 Order Creation Idempotency Flow (Detailed)

The create path is intentionally conservative because the system accepts retries from clients, API gateways, and load balancers.

Detailed sequence:

Normalize X-Idempotency-Key (trim, empty -> null).
Resolve global idempotency state from Redis.
If state is COMPLETED and an orderId exists, load and return the persisted order.
If state is IN_PROGRESS, reject with conflict to prevent duplicate concurrent writes.
Mark key IN_PROGRESS (compare-and-set semantics in Redis).
Perform a second local dedupe lookup by idempotency key in the DB.
Build domain aggregate from DTO, apply domain invariants, and stamp regional metadata.
Persist order snapshot in DB transaction.
Serialize and enqueue outbox event in the same DB transaction.
Commit DB transaction.
After commit callback marks global key COMPLETED -> orderId.
Return API response (new or deduped existing order).

flowchart TD
    A[POST /orders] --> AW[assertWriteAllowed: allowsWrites + not shouldRejectWrites]
    AW --> B[Normalize idempotency key]
    B --> C{Key present?}
    C -- No --> N1[Create aggregate]
    C -- Yes --> D[Resolve Redis idempotency state]
    D --> E{State}
    E -- COMPLETED --> F[Load order by orderId and return]
    E -- IN_PROGRESS --> G[409 conflict]
    E -- ABSENT --> H[Mark IN_PROGRESS]
    H --> I{CAS success?}
    I -- No --> G
    I -- Yes --> J[DB lookup by idempotency key]
    J --> K{Exists?}
    K -- Yes --> L[Return existing order]
    K -- No --> N1
    N1 --> N2[DB write order]
    N2 --> N3[DB write outbox row]
    N3 --> N4[Commit DB txn]
    N4 --> N5[After-commit mark COMPLETED in Redis]
    N5 --> N6[Return 201 Created]

10. Principal Hardening (P0/P1/P2)

P0 reliability hardening (implemented)

outbox publish path is now completion-aware: partition workers wait for async broker completions
outbox publish concurrency is explicitly bounded by app.outbox.publisher.max-in-flight
consumer flow no longer blocks listener threads with Thread.sleep; retries are delegated to retry-topic orchestration
lease fencing uses both owner and version tokens to reject stale callbacks after reclaim

P1 scalability hardening (implemented)

list APIs are bounded by design (/orders/page and bounded fallback for /orders)
cache/list semantics remain backward compatible while protecting against high-cardinality full scans

P2 operational hardening (documentation + observability)

explicit chaos/load test scenarios are now documented and tied to measurable SLO signals
runtime dashboards map directly to outbox reclaim safety, DLQ drift, and backpressure transitions

5. Outbox Publication Pipeline

Components:

OutboxPublisher: scheduler + backpressure + worker orchestration
OutboxFetcher: partition-aware due row claim
OutboxProcessor: async publish + success transition
OutboxRetryHandler: retry policy and next-attempt scheduling

Behavioral properties:

bounded in-flight workers
bounded batch claim size
deterministic row ordering within processing batch
atomic lease transition (PENDING/FAILED -> IN_FLIGHT) before worker release
lease timeout recovery for crashed workers
at-least-once delivery with consumer dedupe

Detailed pipeline:

Scheduler selects only partitions owned by current instance.
Worker permit is acquired (bounded concurrency).
Fetcher claims due rows with FOR UPDATE SKIP LOCKED semantics.
Claimed rows are transitioned to IN_FLIGHT and assigned leaseOwner + leaseVersion.
Processor starts async publish per row (bounded in-flight publish semaphore).
Success callback attempts markSentIfLeased(id, owner, version, ...).
If fenced update fails, callback is stale and safely ignored.
Failure callback runs adaptive retry policy and persists FAILED + nextAttemptAt with lease fencing.
Worker releases permit only after batch completion future resolves.

flowchart TD
    P1[OutboxPublisher poll] --> P2[Acquire worker permit]
    P2 --> P3[Claim partition batch]
    P3 --> P4[Set IN_FLIGHT + leaseOwner + leaseVersion]
    P4 --> P5[Async publish]
    P5 --> P6{Publish result}
    P6 -- Success --> P7[Fenced markSentIfLeased]
    P6 -- Failure --> P8[Adaptive retry plan]
    P8 --> P9[Fenced markFailedIfLeased]
    P7 --> P10[Release permit after completion]
    P9 --> P10

5.1 Lease Fencing and Stale Worker Protection

Lease owner is unique per claim cycle and partition worker.
Lease version increments on every claim.
Success/failure transitions require both owner and version match.
Any delayed callback from an expired lease cannot mutate the row.
This is the core anti-duplication guarantee for asynchronous publication races.

5.2 Eventual Consistency Contract

Command write commit and outbox insert are strongly consistent within one DB transaction.
Kafka publication is asynchronous and eventually consistent.
Consumer dedup markers enforce idempotent state application.
End-to-end contract is at-least-once delivery with idempotent consume.

5.3 Regional Failover Decision Tree

flowchart TD
    R1[Periodic health probe] --> R2[Check DB]
    R2 --> R3[Check Redis]
    R3 --> R4[Check Kafka]
    R4 --> R5{All healthy?}
    R5 -- Yes --> R6[Increment healthy streak]
    R5 -- No --> R7[Increment unhealthy streak]
    R7 --> R8{Unhealthy threshold crossed?}
    R8 -- Yes --> R9[Switch to PASSIVE]
    R8 -- No --> R10[Remain current mode]
    R6 --> R11{Recovery threshold crossed?}
    R11 -- Yes --> R12[Switch to ACTIVE]
    R11 -- No --> R10
    R9 --> R13[Write gating enabled]
    R12 --> R14[Write gating disabled]

5.4 Backpressure Propagation and Adaptive Throttling

flowchart LR
    B1[kafka.consumer.lag.ms] --> BM[BackpressureManager]
    B2[outbox.pending + outbox.failed] --> BM
    B3[db.saturation] --> BM
    BM --> B4{BackpressureManager.Level}
    B4 -->|NORMAL| B0[throttlingFactor 1.0; writes allowed]
    B4 -->|ELEVATED| B5[Tighten rate limits; lower throttlingFactor lengthens AdaptiveRetryPolicyStrategy delays]
    B4 -->|CRITICAL| B7[shouldRejectWrites + lowest throttlingFactor]

5.5 Cross-Region HLC Conflict Resolution Flow

Hybrid Logical Clock (HLC) hardening replaces pure wall-clock ordering in conflict resolution. Each candidate update is evaluated as a tuple: (physicalMillis, logicalCounter).

Winning rule:

incoming wins when incoming.physicalMillis > current.physicalMillis
incoming wins when physical values are equal and incoming.logicalCounter > current.logicalCounter
ties are broken deterministically by region id to prevent non-deterministic outcomes

This avoids false overwrite decisions caused by NTP drift when regions disagree on wall clock by small offsets.

flowchart TD
    H1[Incoming regional update] --> H2[Build incoming HLC from occurredAt + version]
    H2 --> H3[Build current HLC from persisted lastUpdatedTimestamp + version]
    H3 --> H4{incoming HLC > current HLC?}
    H4 -- Yes --> H5[Apply update]
    H4 -- No --> H6{incoming HLC == current HLC?}
    H6 -- No --> H7[Reject as stale]
    H6 -- Yes --> H8[Region-id tie breaker]
    H8 --> H9{Incoming region >= current region?}
    H9 -- Yes --> H5
    H9 -- No --> H7

6. Kafka Consume and Dedupe Architecture

OrderCreatedConsumer provides:

schema validation/deserialization
transactional dedupe marker write (automatic PENDING → PROCESSING is a separate scheduled job, not the consumer)
retry topic routing for transient conditions
DLT handling with structured diagnostics

processed_events table is the idempotent-consume anchor.

7. Redis Usage Model

Query cache

adapter: RedisCacheProvider
cache-aside read strategy
soft-fail fallback to DB
local circuit-breaker behavior during repeated Redis faults
TTL jitter for synchronized-expiry reduction

Rate limiting

filter: RateLimitingFilter
token bucket in Lua for atomic refill/consume
distributed key scope (user:path:ip)
fail-open during Redis degradation to preserve availability

8. Security Model

resource server JWT validation
role claim conversion to Spring authorities
endpoint-level access controls in SecurityConfig
normalized API error envelope with correlation id

9. Failure Matrix

Failure mode	Defensive mechanism	Observable outcome
create transaction fails	transactional rollback	no order, no outbox row
broker unavailable	outbox lease + adaptive retries/backoff	pending/in-flight/failed backlog grows then drains
duplicate Kafka delivery	processed-event dedupe	idempotent consume, no repeated transition
Redis cache failure	fail-soft cache adapter	DB fallback, increased cache error metrics
rate limiter Redis failure	fail-open policy	API remains available
stale concurrent write	optimistic locking	conflict response
prolonged dependency outage	active/passive failover	writes rejected in passive region