Forecast info

How the BTC predictions are derived

This page documents the live implementation currently wired into the app: market data ingest, horizon-specific feature engineering, model training, validation, ensemble construction, forecast publication, and chart rendering.

Pipeline at a glance

The production forecast flow is not a single monolithic model. It is a multi-horizon pipeline where each horizon trains its own offset-specific predictors and then publishes the latest forecast rows into a shared catalog.

Historical OHLCV candles are loaded from the database when available and fall back to Binance when the database is unavailable or not configured.
Each horizon builds its own feature matrix and target vector, where the target is the future BTC close at the requested offset.
A fixed pool of model families is trained for every horizon and offset pair, with validation RMSE recorded for each candidate.
Only stable candidates are allowed into the ensemble: finite RMSE, finite validation predictions, and non-flat output.
The best candidates are published into ml_forecasts, then served by /forecasts and polled by the frontend every 5 minutes.

Horizon-specific training setup

Different dashboard windows are backed by different training windows, feature sets, and direct forecast offsets. They are intentionally trained independently rather than forcing one model to cover every time frame.

1-hour candles

Short horizon / 1D view

Training history

90 days of BTC data

Sequence setup

48-step sequence configuration

Direct ML offsets

1h, 2h, 3h, 4h, 24h, and 48h targets

Lag features from 1 to 48 hours, rolling means and volatility over 6, 12, 24, and 48 bars, RSI(14), MACD, Bollinger bands, 12h and 24h momentum, and intraday calendar features.

This feeds the oneDay range. The UI prefers the sequence leader here by default because independently trained hourly offsets can make the raw ensemble path look too jagged.

Daily candles

Medium horizon / 2W view

Training history

2 years of BTC data

Sequence setup

30-step sequence configuration

Direct ML offsets

1w and 2w targets

Lag features from 1 to 30 days, rolling means and volatility over 7, 14, and 30 days, RSI(14), MACD, Bollinger bands, 7d and 14d momentum, plus day-of-week and month features.

The dashboard also shows a 1d helper waypoint in the 2-week view, but that point is derived in the rendering layer when there is no directly published ML row for 1 day.

Daily candles

Long horizon / 2M view

Training history

3 years of BTC data

Sequence setup

60-step sequence configuration

Direct ML offsets

1w, 2w, 3w, 1m, and 2m targets

Lag features from 1 to 90 days, rolling means and volatility over 14, 30, 60, and 90 days, RSI(21), MACD, Bollinger bands, 14d, 30d, and 60d momentum, plus month and quarter features.

The 2-month product view includes a 4w display point for readability, but that point is not a separately trained offset in the current training script.

Daily candles

Year horizon / 1Y view

Training history

3 years of BTC data

Sequence setup

90-step sequence configuration

Direct ML offsets

1w, 2w, 3w, 1m, 2m, and 3m targets

The year view reuses the long-horizon feature builder and extends the sequence window so longer temporal structures can be represented before publishing 3-month targets.

This powers the year range shown in the dashboard, where future points are rendered on a weekly timeline after the forecast anchors are published.

Model families used in the system

Tree family

Tree-based models

Random Forest, XGBoost, and LightGBM are trained as tabular regressors on the engineered feature matrix. These are fast enough to run expanding-window validation folds.

Kernel family

Kernel model

Support Vector Regression is wrapped in a StandardScaler plus RBF-kernel pipeline so feature scale differences do not dominate the fit.

Sequence family

Sequence models

LSTM, GRU, TCN, and a Transformer encoder are trained with PyTorch. Sequence-ready matrices are reshaped into samples, sequence length, and features when possible; otherwise they fall back to a single-step view of the full engineered vector.

What the UI shows

Published dashboard views

The publication job writes three user-facing model tracks per range and horizon: ensemble, tree, and sequence. It also stores a baseline row using the latest close price, but that baseline is hidden from the frontend.

Validation and ensemble construction

Fast models (random_forest, xgboost, lightgbm, and svr) are evaluated with three expanding walk-forward folds to get a more stable error estimate for ensemble weighting.
Slower neural models (lstm, gru, tcn, and transformer) use a final 80/20 holdout split to keep publication time practical.
After ranking candidates by validation RMSE, the top stable models are trimmed to at most four ensemble members for the deployed blend.
Ensemble weights are not uniform. They are based on inverse RMSE and then adjusted by a diversity penalty so highly correlated predictors do not all receive the same emphasis.
The final artifact stores the ensemble itself, validation RMSE, member names, per-member RMSEs, and the selected ensemble member list for later publication.

Interpretation and limits

These outputs are point forecasts of future BTC close prices. They are not confidence intervals, not trading guarantees, and not a substitute for position sizing, risk controls, or independent market judgment.

Validation RMSE is historical. It helps compare models under past conditions, but crypto market structure can change quickly because of macro events, exchange-specific flows, liquidity shocks, or regime changes that are not fully captured in historical candles.

Different horizons are trained independently, so users should not expect every forecast window to join into one perfectly smooth multi-month narrative. That is a deliberate tradeoff in favor of horizon-specific signal quality.