Cascade Architecture Breakdown

Generated from the current workspace on 2026-04-09. This reflects the code and docs currently present in /home/leo/workspace and /home/leo/cascade, including gaps between spec and implementation.

Executive Summary

There are six main runtime services plus public sites:

Headwaters = human identity Fabric = control plane authority Ledger = billing state machine Conduit = tenant-facing panel backend Cascadia = node/runtime agent Weir = operator/admin console BFF Breakwater = intended machine PKI / mTLS trust layer

The stack uses two internal communication styles on purpose:

The largest current trust gap is internal machine identity. Headwaters already does native client-certificate identity extraction. Fabric and Ledger currently still depend on forwarded identity in proxy mode, and Cascadia currently identifies to Fabric using x-client-identity instead of presenting a true node certificate.

Top-Level System Map

Public browser
  -> useCascade.io / docs
  -> Headwaters public auth + OIDC
  -> Weir operator console
  -> Conduit tenant/customer panel
  -> Ledger browser billing routes

Internal services
  Weir -> Headwaters / Fabric / Ledger
  Conduit -> Headwaters / Fabric / Ledger / Cascadia
  Ledger -> Fabric
  Headwaters -> Fabric
  Cascadia -> Fabric

Async event bus
  Fabric -> NATS -> Conduit
  Ledger -> NATS -> Conduit
  Conduit -> NATS -> downstream consumers (currently limited)

Machine trust
  Intended: Breakwater issues certs and/or services terminate mTLS natively
  Current: hybrid, with some header-forwarding still present

How HTTP, NATS, and WebSockets Fit

HTTP

Used for authoritative request/response work:

  • Fabric tenant lookup, entitlements, signed actions, provisioning
  • Ledger checkout and subscription reads
  • Headwaters auth, org, OIDC, JWKS
  • Cascadia signed-action endpoints and control-plane sync

Use HTTP when the caller needs a success/failure answer now.

NATS

Used as an internal event bus for async propagation:

  • fabric.tenant.*
  • fabric.node.*
  • fabric.provisioning.*
  • ledger.subscription.*
  • conduit.*

Use NATS when one service wants to announce state changes and multiple consumers may react later.

WebSockets

Not the same thing as NATS.

  • WebSockets are point-to-point long-lived connections.
  • NATS is a backend messaging fabric / event bus.
  • Cascadia and browser-facing real-time surfaces could use WebSockets or streams.

WebSockets are for live sessions. NATS is for backend event distribution.

Trust and Authority Boundaries

Concern Authority Notes
Human identity, sessions, MFA, org membership Headwaters Downstream services validate Headwaters JWTs via JWKS.
Tenant existence, state, entitlements, policy, node licensing Fabric The main control-plane authority.
Billing state, subscriptions, checkout, enforcement triggers Ledger Ledger does not mutate tenant state directly; it calls Fabric.
Tenant-facing product/order/service API Conduit Depends on Fabric and Ledger, but owns the tenant panel backend.
Node-local runtime state, logs, files, realized networking, execution Cascadia After workload acceptance, runtime truth is node-local.
Operator/admin aggregation UX Weir Not authoritative; aggregates Headwaters + Fabric + Ledger + some Conduit.
Machine identity / internal PKI Intended: Breakwater Current implementation is incomplete and hybrid.

Service Rundown

Headwaters

Repo: /home/leo/workspace/Headwatersv2

Role: human identity authority.

Main features: signup/login/logout/refresh, password reset, magic links, user profile, orgs, memberships, roles, invites, MFA, machine tokens, OIDC/OAuth, JWKS, internal token introspection.

Exposure:

Native mTLS termination: Yes. Headwaters uses a Rustls acceptor and extracts canonical caller identity from the peer certificate.

Important routes:

Fabric

Repo: /home/leo/workspace/fabric-v2

Role: platform control-plane authority.

Modules it effectively owns:

Exposure:

Current inbound machine identity model: Hybrid. Fabric expects a PeerIdentity in request extensions. Today that usually comes from forwarded identity middleware unless a future native TLS listener is added.

Important routes:

Ledger

Repo: /home/leo/workspace/ledger

Role: billing state machine and enforcement trigger.

Features: product and plan management, checkout session creation, subscription state, webhook ingestion, scheduler-driven enforcement, internal subscription lookups.

Exposure:

Current inbound machine identity model: Hybrid. Similar to Fabric: internal allowlist plus proxy/header mode exists, native peer-cert extraction is not wired like Headwaters.

Important routes:

Conduit

Repo: /home/leo/workspace/Conduit

Role: tenant-facing panel backend for products, services, customers, staff, templates, nodes, networking, billing projections, and support-ish operations.

Features:

Exposure: public tenant/admin API under /api/v1/*.

Current outbound dependencies:

Current machine identity issue: Conduit still injects x-client-identity in its outbound Fabric and Ledger clients. That should go away once proper client cert identity is in place.

Large route families:

Cascadia

Repo: /home/leo/workspace/Cascadia

Role: node-resident sovereign runtime authority.

Features:

Exposure:

Current authentication to Fabric: Not proper node mTLS yet. The current Fabric client sets x-client-identity to cascadia.<node_id>.internal and does not present a true client cert.

Anti-copy licensing state: Partially real.

But: the node certificate lifecycle is not complete. Fabric renew_certificate currently ignores the CSR and returns a generated serial string as the "certificate". This is not a real PKI implementation yet.

Weir

Repo: /home/leo/cascade/Weir

Role: operator/admin console backend-for-frontend.

Features: session context, auth/login dance with Headwaters, org views, onboarding/activation, tenant summary, billing summary, infrastructure summary, node summary, activity projection.

Exposure: console/backend API at /api/*.

Current status: useful route shape exists, but upstream integration is still partially mocked/in-memory in dev fallback mode.

Important routes:

Breakwater

Repo: /home/leo/workspace/Breakwater

Intended role: machine identity authority and trust layer.

What exists now:

Conclusion: Breakwater already exists conceptually as the issuer/trust layer, but the actual runtime implementation is still mostly the local gateway configs rather than a finished authority service.

Client-Facing Flows

Operator Signup / Console Flow

  1. Browser authenticates with Headwaters.
  2. Weir exchanges/uses Headwaters identity and org context.
  3. Weir calls Fabric to create or inspect the tenant.
  4. Weir calls Ledger for billing summary or checkout initiation.
  5. Weir may redirect operators into tenant-facing Conduit workflows later.

Tenant Admin / Customer Flow

  1. Browser uses Conduit UI/API.
  2. Conduit validates Headwaters staff JWTs or customer auth context.
  3. Conduit asks Fabric for policy, entitlements, or signed actions.
  4. For node/runtime operations, browser or Conduit calls Cascadia using Fabric-issued authorization.
  5. For billing data, Conduit queries Ledger internal routes or starts checkout via Ledger.

Cascadia Node Install / Activation Flow

  1. Operator obtains a Fabric bootstrap token.
  2. install.sh runs on the target host and calls hidden _activate-node.
  3. Cascadia derives a hardware fingerprint locally.
  4. Cascadia attests to Fabric with tenant_id, node_id, fingerprint, bootstrap token, hostname, and transport metadata.
  5. Fabric consumes the single-use bootstrap token, creates a node record, issues a license key, and stores the hardware fingerprint.
  6. Cascadia stores local node identity plus an encrypted activation binding tied to the current host.

Internal Flows

Caller Callee Transport Purpose
HeadwatersFabricHTTPFetch service policy bundle
LedgerFabricHTTPTenant lookup, suspend/unsuspend, deletion, profile migration, policy bundle
ConduitFabricHTTPPolicy, tenant bundle, signed actions, provisioning, bootstrap tokens, revocation
ConduitLedgerHTTPInternal subscription lookup, addon record, checkout orchestration
ConduitCascadiaHTTPSigned runtime actions, telemetry pulls, control-plane audit projection
CascadiaFabricHTTPAttest, heartbeat, policy sync, JWKS sync, signed-action introspection, cert renew, transport peers
WeirHeadwatersHTTPOrg membership/auth/session flows
WeirFabricHTTPTenant creation/state/entitlements
WeirLedgerHTTPBilling summary and checkout

NATS Subjects in Use

Fabric emits

fabric.tenant.created, fabric.tenant.cluster_assigned, fabric.tenant.suspended, fabric.tenant.unsuspended, fabric.tenant.deletion_scheduled, fabric.tenant.wipe_initiated, fabric.tenant.deletion_ready, fabric.tenant.cluster_migrated, fabric.tenant.entitlements_changed, fabric.node.attested, fabric.node.revoked, fabric.config.updated, fabric.signing_key.rotated, fabric.cluster.created, fabric.signed_action.issued, fabric.signed_action.introspected, fabric.signed_action.replayed, fabric.provisioning.reserved, fabric.provisioning.reservation_updated.

Ledger emits / Conduit consumes

ledger.subscription.created, ledger.subscription.updated are consumed by Conduit.

Conduit emits

conduit.order.created, conduit.order.pending_payment, conduit.invoice.paid, conduit.invoice.refunded, conduit.provisioning.started, conduit.provisioning.succeeded, conduit.provisioning.failed, conduit.service.created, conduit.service.terminated.

Cascadia Licensing and Anti-Copy Reality Check

Mechanism Status How it works now
Single-use bootstrap token Implemented Fabric creates aft_bt_... tokens, stores only a SHA-256 hash, and consumes them on first attestation.
Hardware fingerprint binding in Fabric Implemented Fabric stores hardware_fingerprint on the node record and rejects mismatches on attest/heartbeat.
Host-bound local activation seal Implemented Cascadia stores an encrypted activation binding under the node state dir and refuses runtime activation if current hardware fingerprint differs.
Real node client certificate authentication to Fabric Not implemented correctly yet Cascadia still identifies to Fabric by sending x-client-identity based on node ID.
Real certificate issuance / CSR signing Stubbed Fabric renew_certificate currently ignores the CSR and returns a generated certificate serial string, not an actual signed cert chain.

Conclusion: copying the Cascadia binary alone is not enough to create a valid second node, because the host-bound activation binding and Fabric fingerprint checks block that path. But the machine-certificate side is not complete yet, so the current design is not finished enough to call the node identity lifecycle production-grade.

Should Fabric Be Split?

Today: keeping Fabric together is reasonable because many of its responsibilities are tightly coupled: tenant state, entitlements, policy bundles, node licensing, signed actions, provisioning, tenant JWT keys.

In an ideal future world: some areas could split if scale or team ownership demands it:

But: splitting too early would add coordination overhead while the trust model is still settling. The highest-value boundary to split first is machine identity into Breakwater, not tenant policy out of Fabric.

Current Recommended Target Architecture

  1. Breakwater becomes the real machine certificate issuer and trust-bundle authority.
  2. Headwaters remains human identity only.
  3. Each internal service terminates client certs natively where feasible.
  4. Fabric consumes verified machine identity directly from TLS, not generic forwarded headers.
  5. Ledger does the same.
  6. Cascadia gets a real node cert lifecycle issued by Breakwater/Fabric-approved flows.
  7. Forwarded identity headers become a tightly scoped transition mechanism only, then disappear.

Companion file: roadmap.html