Architecture · HMM Trade docs

01Three surfaces, one product

HMM Trade has three deployment surfaces working together. Two we operate; one runs on the user's machine for free-tier only. Each surface has a clear boundary so a failure in one doesn't cascade into the others.

Cloud control plane — Next.js app + Supabase Postgres + Stripe. Handles signup, billing, broker connections, bot lifecycle endpoints.
Per-bot Fly Machines — a lightweight VM per paid bot, running the same Python trader code as the local agent.
Local agent — Streamlit dashboard + multi-bot supervisor that runs on the user's laptop. Free tier's primary surface; paid users can run it side-by-side for richer visualisation.

02Cloud control plane (this app)

Next.js 14 + Supabase (Postgres + Auth) + Stripe. Provides:

Marketing pages, signup, billing portal
Auth-gated dashboard, bot list, hosted-bot lifecycle endpoints
Brokerage connection management (/brokers) with KMS-encrypted token storage
The 6-step bot wizard at /bots/new

03Per-bot Fly Machines (paid tiers)

Every hosted bot runs as its own lightweight VM on Fly Machines. Image is registry.fly.io/regime-trader-bot:latest (Python 3.12, ~250MB), 1GB RAM, scale-to-zero between ticks for stock-only bots.

The container does:

Boots and reads its BOT_ID + machine JWT.
Calls GET /api/v1/internal/bots/<id> to fetch profile_json + connection metadata.
Calls POST /api/v1/internal/broker_token which decrypts the broker creds via KMS and returns a short-lived access token.
Materializes config/instances/<id>.yaml + .env on disk so the existing main.py paper startup path reads them like a local install.
Spawns the trader subprocess. Output is teed to Fly logs + the cloud audit sink so /bots/<id> mirrors what fly logs shows.
On SIGTERM (Fly maintenance, Stripe cancel, manual stop): forward to the trader subprocess, flush audit, exit.

04Local agent (free tier)

The Python repo (regime-trader) ships a Streamlit dashboard + multi-bot supervisor that runs entirely on the user's machine. Free-tier users keep their broker keys local; the agent only talks to the cloud for auth + bot-profile sync.

Paid users can also install the local agent for richer visualization (walk-forward backtests, audit pipeline viewer, HMM live-detection charts) — both surfaces read the same cloud bot list, so changes in one show up in the other.

05KMS envelope encryption

Broker credentials are stored in Postgres but encrypted with a per-payload AES-256 data key. The data key itself is encrypted under a Cloud KMS key — neither the database nor our application code can decrypt without a live KMS round-trip. Compromising the DB alone doesn't leak tokens.

Encrypt path (broker connect):

Generate a random AES-256 data key in process memory.
Call KMS.Encrypt on the data key → ciphertext data key.
AES-GCM encrypt the credential JSON with the plaintext data key → ciphertext + IV + auth tag.
Persist (ciphertext, ciphertext data key, IV, auth tag, KMS key id) on broker_connections. Zero the plaintext data key.

Decrypt path (bot launch):

Read the ciphertext + ciphertext data key from broker_connections.
Call KMS.Decrypt on the ciphertext data key → plaintext data key.
AES-GCM decrypt the credential JSON. Pass to the bot over TLS. Plaintext lives in API process memory only for the duration of one HTTPS request.

06Failure modes by tier

Free + cloud down — agent keeps trading on cached JWT for 24h, then asks user to reconnect.
Free + user laptop off — bot stops. Free tier doesn't promise 24/7.
Paid + control plane down — hosted bots keep ticking (Fly Machines don't depend on our control plane to tick). User can't change settings or see the dashboard.
Paid + Fly outage — our incident. Status page, refund per ToS.
Paid + KMS down — bots already running keep ticking (token decrypted in memory). New bot launches fail until KMS recovers. Bounded blast radius.