← All posts
Productmodelsarchitecture

Self-trained vs. admin-published models: the trade-off

Every hosted bot used to train its own HMM. We changed that. Here’s why centralizing model training is a clear win for everyone — and where the trade-off shows up.

Alok Desai··6 min read

Until last week, every hosted bot in HMM Trade trained its own HMM on its own machine, on its own schedule. It worked, but it was wasteful in three different ways and we shipped a fix. This post is the trade-off analysis: why we changed it, and where the original approach was actually better.

The original setup

When a paying user spun up a hosted bot, the container booted, downloaded historical bars, and ran the HMM training pipeline before doing any real trading. The result — a per-bot pickle file on the Fly volume — was used for inference until the next scheduled retrain.

That setup gave each bot a personalized model fit to its configured universe. If you traded a custom 5-symbol portfolio, the HMM had only seen those 5 symbols' bars. Maximum personalization.

The three problems

1. Compute waste

A typical HMM retrain on the universe we ship (10 cryptos, 1300 hourly bars) takes ~5 CPU-minutes for the BIC sweep across 5 candidate K values. At 10 bots × 6 retrains/day, that's 5 CPU-hours/day for what is effectively the same work — the universes mostly overlap, the time horizons mostly overlap, the structure mostly overlaps.

Centralizing it means we train once, and every bot pulls the same .pkl. 5 minutes once becomes 5 minutes total. At 100 bots, the savings cross the threshold from “cute” to “this changes our economics meaningfully.”

2. Quality variance

HMM training is non-convex. You start from random initializations, run EM until convergence, and pick the best of N. With limited compute per bot, we couldn't afford many initializations — typically 4. Across 100 bots retraining independently, you'd see real variation: most models converge to a sensible 7-state fit, a few got stuck in a local minimum that mislabels a regime, one or two had a degenerate state that the bot then traded against.

In a centralized pipeline we run more initializations (8 in production), more iterations, stricter validation gates — none of which scale to per-bot compute. Quality goes up; variance goes down.

3. Slow iteration

Most importantly: when we discover a bug in the feature engineer or want to ship an improvement to the regime labeling logic, the per-bot architecture means every bot needs to retrain to get the improvement. That's either coordinated downtime or a slow rollout via natural retrain cadence — days, not hours.

With a centralized catalog, we publish once, every bot pulls the new artifact on its next hourly poll, the improvement reaches the fleet within an hour.

Centralized model training turns model improvements from a fleet-wide deployment into a single API call.

What we shipped

The catalog is a Postgres table + a Supabase Storage bucket. Each row in model_catalog is a (family, version) pair pointing at a .pkl. Families are hmm-stocks-daily, hmm-crypto-daily, hmm-crypto-1h — coarse buckets keyed by (asset_class, timeframe). One row per family is marked is_default=true, enforced by a partial unique index.

Bots subscribe to families based on their model_prefscolumn (Pro+ users can override; free users get defaults). A daemon thread inside each bot polls the catalog hourly, downloads the new .pkl when the version changes, verifies SHA-256, and atomically swaps it into the bot's model directory. The trader subprocess picks up the new pkl on its next retrain check.

Admins train + publish via a CLI (python -m admin_trainer publish --family ...). Validation gates block bad fits before they get uploaded.

Where the trade-off shows up

Centralization sacrifices personalization. Specifically:

  • The bot trades a universe outside the family's training universe. If the catalog's hmm-crypto-1h was trained on top-10 cryptos and your bot trades only DOGE/USD, the regime labels are fit to a broader distribution than yours. Usually fine — regimes generalize across similarly-behaved symbols — but not always.
  • Power users want fine-tuned regime granularity for their specific niche.A user trading only futures rolls might want a model that's seen futures-specific patterns. We don't cover that yet.

Our compromise: Pro+ tier can set per-bot model preferences. If you don't like the family default, you can pin to a specific version (e.g. roll back if a new publish hurts your P&L). In a future phase, we'll add “personalize this admin model on your data” — fork the admin .pkl and run a few extra EM iterations on the user's bars to specialize it. That's a Live-tier feature for the users who really need it.

Practical impact

For our existing fleet, the migration was invisible: bots kept trading without interruption while the puller caught up with the catalog over an hour. The improvements we've shipped since (better feature engineering, tighter validation gates, longer training windows) have all propagated to every bot the same day.

For users, the only visible change is: when you visit the bot's Regime tab, there's a banner showing which family and version it's pulling from. You can see the model age (last published ~2 days ago for our crypto-1h default) and you can see when a new version landed via the “Fleet news” widget on the dashboard.

The general lesson

Personalization at the cost of operational simplicity is a common tax in ML systems — every customer trains their own model because that's the obvious unit of personalization, and three years later you're paying for it in compute, consistency, and rollout speed. Centralization with opt-in personalization gives you the best of both: shared infrastructure for the 90% case, per-customer overrides for the 10% who genuinely need them.

We'd ship this design a year earlier in retrospect. If you're building anything with per-customer ML, this is the architecture worth defaulting to.

Try the bot

Run a paper bot in 5 minutes. Free tier, your laptop, no card required.

Start free →