Cascade Roadmap
Production-readiness backlog generated from the current workspace code, docs, config, and tests on 2026-04-09. This is intentionally biased toward backend gaps, security hardening, operational reality, and frontend-blocking integration work.
Current Read
Cascade is not blocked on product direction anymore. It is blocked on trust-model completion, real service-to-service wiring, and replacing scaffolded behavior with durable production state. Headwaters and Conduit are the closest to frontend-usable. Fabric is still the control-plane center and the biggest dependency. Ledger is structurally solid and now terminates internal mTLS natively. Cascadia has a real hardware-bound activation model and now receives its first node certificate during activation plus real Breakwater-issued renewals, but it still has bootstrap/fallback cleanup and host-binding hardening to close. Weir is still a BFF prototype with in-memory state and dev fallbacks. Breakwater is now a local issuer, but not yet a full production PKI and gateway implementation.
What This File Covers
- global platform tasks
- per-service production gaps
- security issues visible in code
- integration debt and operational debt
- frontend blockers and test blockers
Highest-Risk Themes
- internal trust still depends on forwarded identity headers in multiple paths
- Breakwater is only a partial issuer today, not yet a full production PKI authority
- Cascadia still keeps a legacy identity-header fallback during bootstrap until full node-cert bootstrapping is finished
- Weir still has dev fallbacks and shallow integration coverage, even though its local state is no longer memory-only
- docs and code are not fully aligned on what is actually enforced
Frontend-Unblock Objective
- Headwaters frontend can proceed after contract freeze and basic integration stack setup
- Conduit frontend can proceed once local stack and trust cleanup are stable
- Weir frontend should wait for real downstream wiring and durable sessions
Global Platform Roadmap
These items cut across all services and should be treated as the real dependency order, not optional cleanup.
1. Finish the Machine Trust Model
- Make Breakwater the authoritative machine PKI issuer for service and node certificates.
- Define the real certificate lifecycle: registration, CSR submission, approval, issuance, renewal, revocation, trust bundle publication, and audit history.
- Stop treating
x-client-identityas a normal application protocol mechanism. - Port native inbound client-certificate termination into Fabric and Ledger, following the Headwaters pattern.
- Keep gateway-forwarded identity only as an explicit migration path where direct native mTLS is not yet implemented.
- Define one canonical identity namespace for all services and nodes and verify all cert subjects match it exactly.
- Introduce cross-check rules so if a forwarded identity header exists during transition, it must exactly match the verified cert identity.
2. Build One Real Local Integration Topology
- Create one canonical local stack that starts Headwaters, Fabric, Ledger, Conduit, Weir, Postgres, Redis, NATS, and the required Breakwater components.
- Remove ad hoc service startup patterns and scattered local ports from day-to-day development.
- Define public listener ports, internal listener ports, local DNS names, and certificate trust paths in one place.
- Make local startup fail fast when required certs, env vars, or dependencies are missing.
- Add a smoke command that verifies the full path: auth, policy fetch, tenant fetch, billing lookup, and one signed Cascadia action.
3. Tighten the HTTP versus NATS Boundary
- Document which actions are authoritative synchronous HTTP calls and which are asynchronous NATS notifications.
- Version all NATS subjects and payload schemas that are expected to be consumed across services.
- Define which subjects are required for correctness versus informational fan-out only.
- Decide whether platform events are durable state propagation or best-effort projections, because current implementations still mix those expectations.
- If any subject family is correctness-critical, move its publishers to an outbox or acknowledged JetStream model instead of plain direct publish calls.
- Add startup checks or alerting for mandatory consumers that are missing or stalled.
- Decide on dead-letter, replay, and retention policy per stream rather than leaving JetStream behavior implied.
- Add idempotency and de-duplication expectations to event-producing services where downstream side effects matter.
4. Unify Production Gates and Runtime Policy
- Standardize CI across all repos:
fmt,clippy -D warnings,test --all-features,audit, and migration contract checks. - Stop relying on old analysis docs that say a repo is fully compliant if the current code has regressed or diverged.
- Add one production-readiness checklist format across services so “done” means the same thing everywhere.
- Raise the shared testing bar beyond route-contract checks and boot smoke tests so auth, trust boundaries, and core state transitions are exercised meaningfully in every service.
- Require explicit health/readiness semantics for all services and document what those checks actually verify.
- Standardize request IDs, structured logging fields, and trace propagation.
- Audit all “development” and “allow insecure” flags and clearly document whether they are build-time only, local-only, or dangerous in runtime config.
5. Correct the Docs Where They Overstate Reality
- Update service docs where the text says cert-derived identity is enforced but the current client or server path still relies on a forwarded header.
- Correct the Cascadia certificate-renewal story so docs no longer imply fully real signed node certificates when the renew path still returns placeholder certificate data.
- Add an explicit “transition architecture” note across Fabric, Ledger, Conduit, Cascadia, and Breakwater.
- Document dev-only scaffolds in Weir instead of letting them look like finished integrations.
- Correct README and release docs that currently claim GHCR or OCI publication patterns not actually implemented in the active workflows.
6. Normalize Releases, CI, and GHCR Publishing Across All Repos
- Define one shared release contract for all platform repos: tagged release, GitHub Release assets, checksum assets, and GHCR production image publication.
- Standardize release triggers so every repo uses the same tag convention and
workflow_dispatchinput contract. - Standardize binary packaging names and asset naming so all Linux release tarballs follow the same format.
- Standardize OCI labels, semantic version tags,
sha-tags, andlatestrules across all production images. - Decide whether dual Docker and OCI image flavors are genuinely required everywhere; if not, simplify to one canonical published production image format.
- Standardize CI gates across all repos to include fmt, clippy, full tests, migration checks where applicable, cargo audit, and release build verification.
- Add missing release/CI/build infrastructure to Breakwater and Weir, which currently do not have the same GitHub workflow and packaging setup as the other services.
- Standardize README release/install sections so each repo documents binaries, GHCR images, version tags, and production deployment expectations in the same way.
- Standardize repo metadata files like
SECURITY.md,CONTRIBUTING.md, and.env.examplewhere appropriate, so each service repo looks and behaves like part of one platform. - Write one platform-wide release policy covering versioning strategy, rollback, image retention, release approval, and whether services cut independently or as coordinated platform releases.
Execution Batch 1 Completed
- Restore the Fabric test baseline after the recent mTLS config shape change.
- Fence Weir dev fallbacks to development and test only.
- Add a proper Weir README with truthful release and runtime guidance.
- Bring Breakwater README release/install docs up to the platform standard.
- Add missing GitHub workflow scaffolding to Weir.
- Add missing GitHub workflow scaffolding to Breakwater.
- Add example environment configuration to Weir.
- Add example environment configuration to Breakwater.
- Add standard metadata files to Weir.
- Add standard metadata files to Breakwater.
- Add a production container build path to Weir.
- Add a production container build path to Breakwater.
- Replace Breakwater’s metadata-only authority shell with a first real authority surface.
- Add native inbound client-certificate termination to Fabric and verify it cleanly.
Service-by-Service Roadmap
Headwaters
Headwaters is the strongest reference implementation for native peer-cert extraction and internal route protection. It already has the shape the other services should follow.
- Freeze the internal mTLS identity-extraction model as the platform reference implementation and reuse it in Fabric and Ledger.
- Review all internal route allowlists and consolidate them into clearer caller classes instead of a growing list of env-driven allowlist variants.
- Add an explicit local and production certificate bootstrap guide for Headwaters so downstream services can rely on one expected setup.
- Remove the remaining outbound
x-client-identitydependency on Fabric policy-bundle fetches once Fabric accepts native peer-cert identity. - Document JWT rotation operations more concretely for live environments, including overlap windows, rollback, and JWKS cache invalidation expectations.
- Add broader end-to-end OIDC browser-flow tests; current contract coverage is solid, but frontend integration will care about exact redirect and consent behavior.
- Benchmark and profile the slowest auth/session tests so frontend work is not waiting on a sluggish backend test cycle whenever auth contracts are touched.
- Audit NATS usage and either harden publication with explicit delivery guarantees or document that those events are non-authoritative best-effort signals.
- Add a documented compatibility contract for downstream consumers of Headwaters JWTs so Fabric, Ledger, Conduit, and Weir all validate the same issuer/audience behavior.
- Keep Headwaters on the shared platform release template by aligning its README release instructions and GHCR image naming with the final common contract.
Fabric
Fabric currently carries tenant lifecycle, profiles, entitlements, policy bundles, signed actions, provisioning, node licensing, catalog, global config, and signing keys. It is the center of gravity of the platform. It also still has the biggest mismatch between desired trust posture and actual deployment posture.
Main Tasks
- Fix the current test fixture breakage caused by the
MtlsConfigshape change and get the repo green again. - Add native inbound client-certificate termination at the listener level instead of relying on proxy/header injection in standard operation.
- Keep Breakwater-proxy mode only as an explicit transition configuration, not as the normal production shape.
- Replace generic trusted-header naming with a Breakwater-owned forwarded-identity contract if a temporary forwarded mode remains.
- Turn the node certificate renewal API into a real CSR validation and certificate issuance flow rather than returning a placeholder string as
certificate. - Split licensing approval logic from certificate signing logic so Fabric authorizes node state and Breakwater signs the resulting cert.
- Validate CSRs against expected node identity and tenant binding, then return a real PEM chain plus expiry metadata rather than a synthetic serial string.
- Persist issuer, serial, expiry, and revocation provenance for node certificates so node state is auditable beyond placeholder fields.
- Decide whether initial node attestation should issue the first node certificate instead of leaving newly created nodes with null certificate state.
- Add negative tests for all node-identity-sensitive paths: attestation, heartbeat, renew-cert, confirm-wipe, transport peer fetch, and signed-action introspection.
- Add explicit readiness checks for dependencies that Fabric claims operationally: Postgres, Redis, NATS, and access to required key material.
- Review startup behavior and decide which missing dependencies should fail closed versus allow degraded start.
- Audit tenant, licensing, config, and catalog event publication and decide which flows need durable delivery rather than direct publish-and-flush behavior.
- Review the enormous scope of the repo and carve internal modules more cleanly even before any service split: tenant lifecycle, entitlements/policy, signing, licensing, provisioning, and catalog.
- Review the policy-bundle key-disclosure contract so only the intended Conduit path ever receives tenant private signing material, with explicit tests around caller identity.
- Replace hard-coded environment-specific URL generation with config-owned values before downstream systems depend on them.
- Move global-config key allowlisting and validation into a typed governance module instead of keeping it embedded as a small static handler list.
- Write a bounded “event ownership” doc for Fabric subjects so downstream services know exactly what they can treat as source-of-truth versus hint.
- Audit all public versus internal endpoints and ensure only the intended public endpoints remain broadly exposed.
- Normalize Fabric’s already-strong CI/release setup into the final shared platform template instead of leaving it as one of several slightly different patterns.
Completed In This Batch
- Add a native peer-certificate extraction primitive to Fabric.
- Require TLS configuration for native Fabric mTLS mode.
- Wire native listener-level mTLS into Fabric startup.
- Keep Fabric tests aligned with the stricter trust and readiness behavior.
Possible Future Split Points
- Move machine PKI and trust-bundle publication into Breakwater as a real boundary first.
- Defer any further split until the trust model is stable and the current repo is green again.
- If split later, the best candidates are catalog, provisioning/reservations, and policy authority.
- Do not split Fabric further before fixing its current trust and test debt.
Ledger
Ledger looks coherent as a billing service, but it is still on the hybrid identity path and some of its runtime posture is softer than Headwaters or Conduit.
- Add native inbound client-certificate termination and verified peer identity extraction so Ledger does not need proxy/header identity injection as a normal mode.
- Stop sending
x-client-identityfrom Ledger to Fabric once Fabric supports native peer identity. - Treat “real client cert loaded in reqwest” and “trusted caller identity derived correctly” as separate requirements; today outbound mTLS exists but identity is still redundantly asserted by header.
- Decide whether startup should fail closed when Fabric policy fetch fails instead of warning and continuing on config defaults.
- Strengthen readiness to cover all dependencies Ledger actually needs for safe operation: database, Redis, Fabric reachability for key enforcement paths, and provider configuration sanity.
- Add more integration tests that cover the real internal route auth path without relying on the old identity header convention.
- Add explicit test coverage for the Breakwater-proxy transition mode so the temporary path is still deliberate and verified.
- Audit webhook logging and scheduler logging to ensure no provider secrets, raw webhook bodies, or sensitive account details can leak in failure traces.
- Define a recovery playbook for subscription drift between Ledger and Fabric tenant state so manual repair is consistent.
- Audit every mutation path that calls Fabric and make retry plus idempotency semantics explicit, especially scheduler-driven suspend, unsuspend, deletion, and profile-migration actions.
- Move NATS publication from warning-only best-effort behavior to an explicit contract: durable business events or clearly non-critical telemetry.
- Document and test NATS optionality more clearly; if events are disabled, confirm what observability and downstream behavior are intentionally absent.
- Review scheduler lock expiry and mid-cycle recovery semantics so long-running enforcement cycles cannot overlap unexpectedly if Redis lock TTL expires under load.
- Decide whether enforcement-side event publication after suspend/wipe actions is authoritative enough to require stronger guarantees than the current best-effort post-commit publish.
- Validate provider configuration more aggressively at startup so the selected provider cannot come up “ready” with empty credentials or a nonsense base URL.
- Bring Ledger’s GHCR image naming, README release notes, and artifact naming into the final shared platform release contract.
- Require explicit TLS configuration and test alignment for native Ledger mTLS mode.
Conduit
Conduit is broad and relatively advanced, but it still contains visible transitional trust behavior and some runtime flags that need stronger boundaries before production.
- Remove direct caller-controlled identity header injection for Fabric and Ledger clients once those services support native peer-cert identity.
- Replace the configurable generic service-identity header pattern with a stricter machine-auth abstraction that either uses mTLS directly or a Breakwater-owned forwarded identity contract.
- Eliminate operator role-mutation dependence on a spoofable proxy header and replace it with verified service identity on a locked internal path.
- Audit all use of
danger_accept_invalid_certsand hard-fence it so it cannot accidentally be enabled outside tightly controlled local/test modes. - Audit every service-side call to Cascadia and ensure the TLS and hostname requirements align with the stated node trust model.
- Remove generic
service_identity_headerconfiguration once trust derivation is native; the caller should not choose the identity transport primitive. - Add contract tests for all high-value integrations: Fabric policy fetch, signed action issuance, provisioning resolve/finalize, Ledger subscription fetch, addon recording, and checkout creation.
- Document which NATS subjects Conduit requires for correctness and what state remains correct if NATS is unavailable.
- Replace plain ephemeral NATS subscribe loops with an explicit durability model if tenant/node state convergence depends on those events being seen.
- Review read-only mode behavior during policy refresh failures and define which APIs should degrade and which must hard-fail.
- Decide whether boot should fail closed if global or tenant policy bootstrap fails, rather than allowing the service to enter a partially informed read-only runtime.
- Review the current path-based read-only exemptions and prove they cannot leave unsafe write paths available when Fabric policy is stale or unreachable.
- Verify that customer JWT issuance and validation behavior is fully documented, because the current docs emphasize Headwaters authority but Conduit also has its own customer auth paths.
- Add one end-to-end frontend contract fixture per major panel family so the future UI work has stable payload examples.
- Review the large migrations/import handlers separately for timeout control, remote trust boundaries, and safe failure semantics before calling the service production-ready.
- Encrypt stored migration credentials for external platforms instead of persisting raw JSON strings in the database behind an
_encfield name. - Replace the migration SFTP helper’s “accept any server key” behavior with real host-key verification or an explicit audited trust-on-first-use model.
- Revisit the tenant customisation threat model, because tenant-supplied
custom_jsand landing-page HTML currently rely on light content filtering rather than a stronger isolation model. - Audit node-admin and telemetry routes for origin binding, idempotency, and tenancy checks one more time after trust-model changes.
- Expand tests beyond router/spec parity and narrow smoke coverage so billing, trust, migration, and node-admin flows are exercised with realistic state and failure cases.
- Normalize Conduit’s release workflow, GHCR naming, and artifact naming with the shared platform release contract.
Cascadia
Cascadia now has a real hardware-bound activation story, CSR-backed node issuance, and certificate verification before accepting new materials. The remaining gaps are revocation/startup hard-fail behavior and a few operator/recovery workflows rather than the old claimed-identity runtime path.
Licensing and Anti-Copy Tasks
- Keep the current hardware-fingerprint binding model and document it as the canonical anti-copy baseline.
- Review hardware fingerprint derivation for spoofability, hardware replacement tolerance, and operator support workflows when legitimate hardware changes occur.
- Add explicit operator workflows for license transfer, hardware fault replacement, and forced rebind with audit evidence.
- Record and expose more node activation state for support and fraud analysis: activation time, prior fingerprints, revocation reason, and recent heartbeat mismatches.
- Add replay and duplication tests around bootstrap-token consumption under concurrency and flaky networks.
Machine-Cert Tasks
- Stop identifying to Fabric via
x-client-identityand move to a real client certificate presented by Cascadia. - Use native node client certificates automatically for post-bootstrap Fabric calls whenever local cert material exists.
- Generate a real node keypair and CSR locally during install/activation.
- Have Fabric authorize the node identity, tenant binding, and license state, then hand issuance off to Breakwater.
- Store real certificate metadata locally and verify expiry/renewal based on actual X.509 data rather than placeholder response shape.
- Validate the returned certificate chain before accepting it into runtime state.
- Add revocation handling and node-disable behavior if renewal fails or the cert is revoked.
- Make startup behavior around missing/expired internal node certs explicit and fail safe.
- Bind local persistent node identity to the issued certificate subject/SAN so copied state cannot simply present a mismatched cert after host migration.
- Review whether install-time activation should refuse to persist “active” node state until certificate issuance completes successfully.
- Replace the current fake CSR builder string with real key generation and X.509 CSR encoding before any certificate lifecycle work is treated as implemented.
Runtime Hardening Tasks
- Review all local privileged hooks, especially ACME hook execution and backend auto-install features, because they are high-risk operator-supplied automation surfaces.
- Add clearer trust separation between public TLS serving material and control-plane node identity material.
- Add stronger observability around sync loops so Fabric policy, JWKS, heartbeat, and cert-renew failures are visible separately.
- Add documented disaster recovery for state directory restoration, key recovery, and what can or cannot be reconstructed from control-plane state.
- Review hardware-fingerprint entropy and fallback behavior on platforms where the current fingerprint inputs are missing or easily cloned.
- Constrain or disable insecure control-plane transport overrides in anything beyond tightly controlled local development, even though config validation already rejects plain HTTP by default.
- Review the root-only auto-install package flow so policy/config cannot silently turn Cascadia into a privileged package manager without explicit operator intent and auditability.
- Constrain ACME hook execution and any downloaded runtime-component install path with stronger provenance checks, integrity validation, and operator acknowledgement.
- Audit which node HTTP endpoints remain reachable after TLS bootstrap completes, especially
/metrics, browser signed-action routes, and control-plane maintenance routes. - Replace the current metrics bootstrap placeholder with real structured metrics emission before treating node observability as implemented.
- Continue hardening backend adapters so future runtime implementations cannot bypass the tenant/policy boundary that the current planning-first adapters respect.
- Add the missing production image publication path if Cascadia is meant to ship GHCR runtime images alongside its binary releases; today it only publishes GitHub binary assets.
- Correct Cascadia README and operator docs so they stop claiming Docker/OCI GHCR publication until the release workflow really does it.
- Normalize the multi-binary daemon/CLI/install wrapper release layout so it follows the same release metadata and README conventions as the other services.
Breakwater
Breakwater is currently mostly a spec and local gateway config set. The Rust service is just a metadata/health shell. This is the biggest gap between architecture intent and shipping code.
- Implement the real authority service: CA/bootstrap, intermediate management, service registry, node registry, CSR validation, issuance policy, renewal, revocation, and trust-bundle publication.
- Implement the real gateway service: inbound mTLS termination, peer identity derivation from certificate, allowlist enforcement, upstream forwarding rules, and audit logging.
- Define storage for certificates, revocation state, trust-bundle versions, and issued identity metadata.
- Define how Breakwater authenticates issuance requests from Fabric and from internal operators.
- Build local issuance tooling so development certs are created deterministically rather than manually or implicitly.
- Replace the current Caddy-only local trust path with a supported Breakwater-owned local topology or formally declare the Caddy layer as the temporary gateway implementation.
- Implement metrics and audit for cert issuance, failed validation, revocation checks, and gateway-auth failures.
- Replace the current always-ready metadata-only HTTP surface with real readiness that fails if key material, storage, or upstream trust state is unavailable.
- Define certificate profile templates for service identities, node identities, gateway-forwarding identities, and any operator-issued certs so SAN/CN usage is not ad hoc.
- Replace the current binary entrypoints that only wrap
/healthz,/readyz, and/v1/metawith real authority and gateway request paths before calling the service implemented. - Write the migration plan for moving a service from gateway-terminated trust to native direct mTLS without service downtime.
- Document the exact forwarded identity header names and stripping behavior if any forwarding remains during transition.
- Add missing repo-delivery basics: CI workflow, tagged release workflow, binary packaging, GHCR publishing strategy, README release/install instructions, and standard metadata files.
- Add real tests for issuance policy, revocation, forwarded-identity stripping, and gateway allowlist enforcement once the implementation exists.
Completed In This Batch
- Add a standard CI quality gate workflow to Breakwater.
- Add a tagged release workflow to Breakwater.
- Add a production Dockerfile strategy to Breakwater.
- Document Breakwater release and install behavior truthfully.
- Add repo governance metadata to Breakwater.
- Add GHCR production image publication to Breakwater’s active release contract.
- Add trust-bundle publication and manifest-backed identity inspection to Breakwater authority.
- Make Breakwater readiness reflect real authority state.
- Add a first real local certificate issuance and renewal surface to Breakwater authority.
Weir
Weir is clearly defined product-wise, but the implementation is still mostly a working scaffold. It is not yet suitable as the stable backend for a serious frontend integration pass.
- Replace all dev fallback clients with strict real-service clients in any environment intended for integration or staging.
- Replace the in-memory session store with a durable secure session backend.
- Replace sequential predictable session IDs with cryptographically random session identifiers.
- Replace static timestamps in session, onboarding, invite, and activity stores with real timestamps.
- Replace in-memory onboarding, activity, and invite stores with durable persistence or remove them if they are only projections from other authorities.
- Define the real source-of-truth boundary for each Weir endpoint so the service does not silently become its own shadow authority.
- Implement secure OAuth/OIDC session handling fully: state, PKCE, token storage, refresh, expiry, logout propagation, and CSRF protection.
- Add explicit internal authentication for Weir-to-Headwaters, Weir-to-Fabric, and Weir-to-Ledger calls instead of bare unauthenticated HTTP clients.
- Move the “real” upstream clients off plain default reqwest usage and onto the same platform mTLS/trust contract as the rest of the stack.
- Define what Weir persists locally versus what it only aggregates transiently.
- Stop storing OAuth flow records, access tokens, refresh tokens, and PKCE material in predictable in-memory sequential IDs; move them into an expiring secure store with random opaque identifiers.
- Add end-to-end tests against real Headwaters, Fabric, and Ledger instead of only smoke tests against scaffold behavior.
- Audit session cookie attributes and confirm secure, httpOnly, sameSite, rotation, and revoke-on-logout semantics are complete.
- Remove test-only context-header style assumptions from any runtime-reachable paths.
- Fence or remove the current explicit
x-weir-*context-header override path so it cannot be used outside tightly controlled test/dev behavior. - Make configuration strict enough that integration/staging boots fail if any required downstream base URL is missing, instead of silently falling back to dev clients.
- Validate public-base-url and OAuth config coherently so Weir cannot come up with redirect/session settings that only accidentally work in local mode.
- Replace the tiny smoke-test posture with real integration coverage for login, consent, session issuance, org switching, billing, and onboarding flows.
- Add all missing repo hygiene and delivery layers: README, CI, tagged release workflow, binary packaging, GHCR image publication if intended, Dockerfile strategy if intended, and standard metadata files.
Completed In This Batch
- Introduce an explicit Weir runtime profile model.
- Require real downstream base URLs outside dev/test.
- Require Headwaters OAuth credentials outside dev/test.
- Replace predictable Weir session IDs.
- Replace predictable Weir auth flow IDs.
- Replace predictable Weir OAuth state values.
- Replace static session-store timestamps.
- Replace static activity-event timestamps.
- Align Weir session cookies with HTTPS deployments.
- Update Weir tests for opaque identifier behavior.
- Add a production image build and publish path to Weir.
- Add missing repo metadata and operator docs to Weir.
Public Surface, Docs, and Dev Experience
- Update the public docs and marketing language so early access claims stay honest relative to the real backend stack.
- Publish one canonical architecture diagram that clearly distinguishes public HTTP, internal mTLS HTTP, and NATS.
- Add one operator-focused setup document for bringing up the backend stack for frontend work.
- Document which services own which frontends and which repos are pure backend versus site/docs.
- Add a concise “current maturity” table to the docs so future work does not drift because everybody assumes a service is further along than it is.
Recommended Build Order
| Order | Work Item | Why first |
|---|---|---|
| 1 | Fix Fabric test breakage and restore green baseline | Control plane cannot remain in a knowingly broken state while architecture work continues |
| 2 | Implement Breakwater authority and formal transition contract | Everything else depends on having one real machine PKI story |
| 3 | Add native inbound mTLS to Fabric | Fabric is the most critical trust boundary |
| 4 | Add native inbound mTLS to Ledger | Completed: Ledger now has native listener-level peer-cert identity extraction and stricter readiness/config checks. |
| 5 | Move Cascadia from claimed node identity header to real node cert flow | Licensing and anti-copy guarantees depend on this being real |
| 6 | Remove caller-side x-client-identity injection from Conduit and Ledger |
Cleans up the transitional protocol and lowers spoof risk |
| 7 | Normalize CI, tagged releases, binary artifacts, and GHCR publishing across all repos | Release discipline should be finished before the platform is treated as nearly done |
| 8 | Stand up one canonical local stack | Needed before frontend work can happen against stable contracts |
| 9 | Replace Weir dev fallbacks and in-memory state | Needed before the Weir frontend is worth integrating |
| 10 | Begin Headwaters and Conduit frontend implementation against the real stack | Those two are closest once the local topology is stable |
| 11 | Begin Weir frontend implementation after backend stabilization | Weir is currently the least backend-ready of the planned frontends |
What Is Ready Enough Soonest
- Headwaters backend contracts
- Conduit backend contracts
- shared docs and architecture references
What Is Still Unsafe To Pretend Is Finished
- Breakwater as a real issuer
- Cascadia certificate lifecycle
- Cascadia still identifies itself to Fabric by header instead of a real node certificate
- Weir durable state and real integrations
Companion file: architecture.html