Forecast info
How the BTC predictions are derived
This page documents the live implementation currently wired into the app: market data ingest, horizon-specific feature engineering, model training, validation, ensemble construction, forecast publication, and chart rendering.
Pipeline at a glance
The production forecast flow is not a single monolithic model. It is a multi-horizon pipeline where each horizon trains its own offset-specific predictors and then publishes the latest forecast rows into a shared catalog.
Historical OHLCV candles are loaded from the database when available and fall back to Binance when the database is unavailable or not configured.
Each horizon builds its own feature matrix and target vector, where the target is the future BTC close at the requested offset.
A fixed pool of model families is trained for every horizon and offset pair, with validation RMSE recorded for each candidate.
Only stable candidates are allowed into the ensemble: finite RMSE, finite validation predictions, and non-flat output.
The best candidates are published into
ml_forecasts, then served by/forecastsand polled by the frontend every 5 minutes.
Horizon-specific training setup
Different dashboard windows are backed by different training windows, feature sets, and direct forecast offsets. They are intentionally trained independently rather than forcing one model to cover every time frame.
1-hour candles
Short horizon / 1D view
Training history
90 days of BTC data
Sequence setup
48-step sequence configuration
Direct ML offsets
1h, 2h, 3h, 4h, 24h, and 48h targets
Lag features from 1 to 48 hours, rolling means and volatility over 6, 12, 24, and 48 bars, RSI(14), MACD, Bollinger bands, 12h and 24h momentum, and intraday calendar features.
This feeds the oneDay range. The UI prefers the sequence leader here by default because independently trained hourly offsets can make the raw ensemble path look too jagged.
Daily candles
Medium horizon / 2W view
Training history
2 years of BTC data
Sequence setup
30-step sequence configuration
Direct ML offsets
1w and 2w targets
Lag features from 1 to 30 days, rolling means and volatility over 7, 14, and 30 days, RSI(14), MACD, Bollinger bands, 7d and 14d momentum, plus day-of-week and month features.
The dashboard also shows a 1d helper waypoint in the 2-week view, but that point is derived in the rendering layer when there is no directly published ML row for 1 day.
Daily candles
Long horizon / 2M view
Training history
3 years of BTC data
Sequence setup
60-step sequence configuration
Direct ML offsets
1w, 2w, 3w, 1m, and 2m targets
Lag features from 1 to 90 days, rolling means and volatility over 14, 30, 60, and 90 days, RSI(21), MACD, Bollinger bands, 14d, 30d, and 60d momentum, plus month and quarter features.
The 2-month product view includes a 4w display point for readability, but that point is not a separately trained offset in the current training script.
Daily candles
Year horizon / 1Y view
Training history
3 years of BTC data
Sequence setup
90-step sequence configuration
Direct ML offsets
1w, 2w, 3w, 1m, 2m, and 3m targets
The year view reuses the long-horizon feature builder and extends the sequence window so longer temporal structures can be represented before publishing 3-month targets.
This powers the year range shown in the dashboard, where future points are rendered on a weekly timeline after the forecast anchors are published.
Model families used in the system
Tree family
Tree-based models
Random Forest, XGBoost, and LightGBM are trained as tabular regressors on the engineered feature matrix. These are fast enough to run expanding-window validation folds.
Kernel family
Kernel model
Support Vector Regression is wrapped in a StandardScaler plus RBF-kernel pipeline so feature scale differences do not dominate the fit.
Sequence family
Sequence models
LSTM, GRU, TCN, and a Transformer encoder are trained with PyTorch. Sequence-ready matrices are reshaped into samples, sequence length, and features when possible; otherwise they fall back to a single-step view of the full engineered vector.
What the UI shows
Published dashboard views
The publication job writes three user-facing model tracks per range and horizon: ensemble, tree, and sequence. It also stores a baseline row using the latest close price, but that baseline is hidden from the frontend.
Validation and ensemble construction
Fast models (
random_forest,xgboost,lightgbm, andsvr) are evaluated with three expanding walk-forward folds to get a more stable error estimate for ensemble weighting.Slower neural models (
lstm,gru,tcn, andtransformer) use a final 80/20 holdout split to keep publication time practical.After ranking candidates by validation RMSE, the top stable models are trimmed to at most four ensemble members for the deployed blend.
Ensemble weights are not uniform. They are based on inverse RMSE and then adjusted by a diversity penalty so highly correlated predictors do not all receive the same emphasis.
The final artifact stores the ensemble itself, validation RMSE, member names, per-member RMSEs, and the selected ensemble member list for later publication.
Interpretation and limits
These outputs are point forecasts of future BTC close prices. They are not confidence intervals, not trading guarantees, and not a substitute for position sizing, risk controls, or independent market judgment.
Validation RMSE is historical. It helps compare models under past conditions, but crypto market structure can change quickly because of macro events, exchange-specific flows, liquidity shocks, or regime changes that are not fully captured in historical candles.
Different horizons are trained independently, so users should not expect every forecast window to join into one perfectly smooth multi-month narrative. That is a deliberate tradeoff in favor of horizon-specific signal quality.