← All posts
Guidehmmmodelseducation

Hidden Markov Models, explained without the equations (mostly)

An HMM is a state machine where you can’t see the state — only the breadcrumbs it leaves. That’s the whole idea. Here’s what it does, why it works for markets, and where it breaks.

Alok Desai··12 min read

Hidden Markov Models have a reputation for being one of those techniques you have to be a math person to use. The math is real and useful, but the ideais much simpler than the equations let on, and you can build genuine intuition without ever computing a forward-backward pass by hand. This post is the explanation we wish we'd had when we started.

The state machine you can't see

Imagine you're tracking the weather in a city you've never visited. You can't look at the sky. But every day a roommate, who's in that city, sends you a photo of what they're wearing.

Sunny days they wear shorts. Rainy days, a coat. Cloudy days, a hoodie. You don't see the weather (the hidden state) but you see the observations — the outfit.

Two more things you know:

  • The weather doesn't flip randomly each day. Sunny days tend to be followed by sunny days; rainy days tend to be followed by rainy days. The next day's weather depends (probabilistically) on today's. That's the Markov property: the future depends on the present, not on the entire past.
  • The roommate isn't infinitely consistent. Sometimes they wear a hoodie when it's actually sunny because they were cold this morning. The mapping from weather → outfit is probabilistic, not deterministic.

You've just described an HMM. There's a hidden state (weather), there are observations (outfits), there's a transition probability between hidden states day-to-day, and there's an emission probability mapping each hidden state to what you observe. Given a sequence of outfits over weeks, you can do three things:

  1. Estimate the parameters.Given the observations, what are the most-likely transition + emission probabilities? (This is “training” — done with the Baum-Welch algorithm, which is just expectation- maximization specialized for HMMs.)
  2. Find the most likely hidden state today. Given everything you've observed up to right now, what's the probability the weather is currently sunny / cloudy / rainy? (This is the filtered state estimate. The forward algorithm computes it.)
  3. Find the most likely sequence of hidden states. Looking back over the entire history, what was the best-fitting weather sequence that explains the observations? (Viterbi.)

For trading we mostly care about #2: filtered state. Given everything I've seen up to and including the bar that just closed, what regime am I in right now?

The forward algorithm is just “multiply probabilities, sum over the states you might have been in yesterday.” The equations look scary; the operation is bookkeeping.

Translating to markets

Markets aren't weather, but the analogy carries. The hidden state is the regime — calm, choppy, crisis, top-of-bull, whatever. The observations are the things we measure: returns, realized volatility, autocorrelation, gap behavior, etc.

  • The Markov propertyfor markets: given today's regime, tomorrow's regime depends on today's. Crisis tends to follow crisis; calm tends to follow calm. Empirically, regime self-loops are 0.85+ for daily models. Markets do remember.
  • The emission distribution for each regime is a Gaussian (or mixture of Gaussians, in fancier setups) over the observation features. DEEP_BEAR has wide-spread, left-skewed returns. TOP_BULL has wide-spread, right-skewed returns. NEUTRAL has tight, centered returns.

You hand the HMM a long history of bars, run Baum-Welch, and out comes a fitted model: K hidden states, a transition matrix between them, and an emission distribution per state. Then for every new bar you observe, you run one step of the forward algorithm and get a probability distribution over which regime the market is currently in.

python
# Pseudo-Pythonic — close to what runs on every tick.
features = compute_features(bars)             # ~15 floats per bar
state_probs = hmm.forward(features)           # P(state | observations)
top = state_probs[-1].argmax()                # most-likely current regime
confidence = state_probs[-1][top]             # 0.92 = highly confident
label = hmm.state_labels[top]                 # "STRONG_BULL"

Why pick HMMs vs. anything else?

There are a hundred ways to bucket markets into regimes — k- means, GMMs, hand-coded rules, neural nets, the chart pattern a guy on YouTube draws with a sharpie. HMMs have a couple of properties that matter for trading:

  1. Temporal structure is built in. The model knows transitions are persistent. K-means clustering on features alone would label two adjacent days as different regimes if their features happen to differ — useless. HMM smoothing prevents that.
  2. Filtered probabilities are causal.The forward algorithm uses only past data — no peeking ahead. This is the difference between a backtest you can trade and a backtest that overstates performance because of look-ahead bias. Viterbi (full sequence) does peek; we deliberately don't use it for live trading decisions.
  3. The output is calibrated probability, not a discrete label. Strategies can act on P(STRONG_BULL) = 0.92 differently than P(STRONG_BULL) = 0.55. Models that give you a hard label throw that information away.
  4. The model has few parameters relative to the data.A 7-state HMM on 15 features fits ~1000 parameters. It's tractable to train on a few thousand bars without overfitting. Compare to a deep net which would need millions of bars.

Where HMMs break

Equally important to know:

  • HMMs assume the world is stationary. Once fit, the model thinks the same regimes that occurred in training will keep occurring. Sometimes the world genuinely changes — new asset, new market microstructure — and the model is now wrong about everything. Fix: periodic retrain (we retrain hourly + on bar-count thresholds).
  • Gaussian emissions are a lie.Returns aren't Gaussian (fat tails, skew). HMMs are robust to this in practice but the calibration of P(extreme event) is biased toward zero. Don't trust an HMM's tail-risk estimates; use it for the body of the distribution, use something else for the tails.
  • Hidden states aren't named — you label them. The HMM finds K Gaussians in feature-space; you decide that the highest-mean-return one should be called “TOP_BULL.” Two different runs on different windows may swap labels. This isn't a bug; it's a reminder that the labels are interpretive scaffolding.
  • Regime-switching is sharp, but real markets aren't. Real markets transition gradually. HMMs pretend each bar is definitively in one state. The probabilistic output partially compensates (you can see the model is uncertain mid-transition) but the underlying graph structure is discrete. If you need continuous-state modeling, look at stochastic volatility models instead.

What changed our intuition

Two debugging exercises shaped how we think about HMMs:

1. Plot the regime ribbon over the price chart. Color each bar by its argmax regime. You'll immediately see whether the model is finding macroeconomically meaningful states (regimes line up with real bear markets, real recoveries) or just clustering on noise (regimes flip every few bars). If it's the latter, you have too many states or not enough features.

2. Plot the transition matrix as a heatmap. Self-loops should be the brightest cells (regimes persist). Off-diagonals tell you which transitions are likely. For US equities, the bullish-to-bearish path almost never goes directly DEEP_BEAR ← TOP_BULL — it goes through NEUTRAL or WEAK_BEAR first. That structural insight should match intuition. If it doesn't, the fit is suspicious.

The minimum viable HMM

If you want to play with this concept yourself, the absolute minimum setup is:

python
from hmmlearn.hmm import GaussianHMM
import numpy as np

# Features: shape (T, F) — T bars × F features
features = np.column_stack([
    log_returns,
    realized_vol_5d,
    realized_vol_20d,
])

hmm = GaussianHMM(n_components=5, covariance_type="full", n_iter=100)
hmm.fit(features)

# Filtered probabilities (causal)
log_probs = hmm.score_samples(features)[1]
state_probs = np.exp(log_probs)
current_regime = state_probs[-1].argmax()

That's 12 lines and you have a working HMM. From there it's feature engineering, regime labeling, downstream strategy composition, retrain cadence, and a thousand other knobs — but the core abstraction is small enough to fit in your head.

Why we leaned in

We didn't pick HMMs because they're trendy (they aren't) or because they're state-of-the-art for sequence modeling (transformers are). We picked them because the abstraction matches what we wanted to do: identify a small set of distinct market behaviors, transition between them probabilistically, and act differently in each one. The model expresses exactly that and nothing more. Anything more flexible would've overfit on our data sizes; anything less flexible (k-means, fixed thresholds) couldn't handle the temporal smoothing.

The math works because the model matches the problem. That's what good modeling feels like, and it's why we'd encourage anyone interested in algorithmic trading to spend a weekend with HMMs even if you decide they're not the right tool for you.

Try the bot

Run a paper bot in 5 minutes. Free tier, your laptop, no card required.

Start free →