Deep Learning · Time Series · LSTM
A stacked LSTM neural network trained on 698,400 minutes of Coinbase BTC/USD data to predict closing price one minute into the future, achieving a mean absolute error of $38.44 on a $6,933 asset.
Performance
Evaluated on held-out test data from June-August 2018, a period the model never encountered during training or validation.
Test MAE (USD)
$38.44
mean absolute error
Mean Error
$31.16
systematic bias
Error Rate
0.55%
of $6,933 avg price
Test MSE
6e-06
scaled mean sq. error
Std Error
$36.45
error consistency
Best Val Loss
1.07e-6
epoch 15
| Run | MAE (USD) | Change made |
|---|---|---|
| Run 1 | $608.41 | Distribution shift during the 2018 crash |
| Run 2 | $91.79 | 2-layer LSTM with fixed splits and stable price regime |
| Run 3 | $32.68 | 3-layer LSTM + AdamW + ReduceLROnPlateau |
| Run 4 (final) | $38.44 | Retrain after session loss with consistent result |
Interactive
Runs the actual trained LSTM model in your browser using TensorFlow.js on held-out 2018 test windows. Each replay hides the next minute, asks the model to predict it, then reveals the real candle from the test set.
Click replay to load model and test data
Replay Current Close
...
Held-out test window
Predicted Next Minute
...
Model not run yet
Loading model...
Replay complete
Current Close
...
latest 1-min candle
Predicted Next
...
1 minute ahead
Prediction Error
...
vs actual target
Actual Next
...
held-out target
TEST WINDOW + HIDDEN NEXT CANDLE
Visualisation
The model tracks real BTC price movements with high fidelity across the June-August 2018 test window, including a sharp intra-period correction.
Process
Each preprocessing decision is documented and justified by exploratory analysis conducted in Notebook 1 before any transformation was applied.
01 DATA AUDIT
109,069 rows (5.19%) have all values as NaN, inactive trading windows, not corrupted data. A further 58,354 timestamps are absent from the CSV entirely, discovered only after reindexing to a complete 60-second DatetimeIndex.
02 TRIMMING
The longest consecutive gap spans 38.4 hours, concentrated in 2014-2016. Forward-filling across such gaps produces 2,303 identical rows, a flat line the model would learn as a genuine price signal. Trimming to 2017-01-01 eliminates all such artifacts.
03 FEATURES
Open, High, Low and Weighted_Price are all correlated ≥ 0.999 with Close, making them redundant. Volume_(BTC) is retained after log₁p transform as the only independent feature (corr = 0.15). Time is encoded as sin/cos cyclic features to preserve periodicity.
04 SPLITTING
All three splits fall within the 2017-2018 bull market regime to ensure consistent price distribution. No shuffling is applied because shuffling a time series leaks future price information into training.
05 SCALING
MinMaxScaler is fitted exclusively on training data. Fitting on the full dataset would leak validation and test price ranges into training, letting the model indirectly know future price levels before seeing them.
06 PIPELINE
tf.data.Dataset streams sliding windows with shift=5, one window every 5 minutes. Pre-computing 698,000 windows of shape (1440, 6) would require ~27GB of RAM. Streaming keeps memory usage under 2GB throughout training.
Model
A three-layer stacked LSTM with Dropout regularisation, trained with AdamW and an adaptive learning rate schedule.
| Hyperparameter | Value | Rationale |
|---|---|---|
| Loss | MSE | Task requirement |
| Optimizer | AdamW | Adam + weight decay reduces overfitting |
| Learning rate | 0.0005 | Precise convergence without oscillation |
| Weight decay | 1e-4 | L2 regularisation via AdamW |
| LR schedule | ReduceLROnPlateau | Halves LR when val_loss plateaus |
| Batch size | 32 | More gradient updates per epoch |
| Early stopping | patience=7 | Restores best weights automatically |
| Window size | 1,440 steps | 24 hours of 1-minute data |
| Horizon | 1 step | Predict 1 minute ahead |
Implementation
Selected excerpts from the three notebooks. Full source available on GitHub.
# Streaming sliding windows - no RAM crash def make_dataset(data, shuffle=False): total_length = WINDOW_SIZE + HORIZON ds = tf.data.Dataset.from_tensor_slices(data) ds = ds.window(total_length, shift=5, drop_remainder=True) ds = ds.flat_map(lambda w: w.batch(total_length, drop_remainder=True)) ds = ds.map( lambda w: (w[:WINDOW_SIZE], w[WINDOW_SIZE + HORIZON - 1, CLOSE_IDX]), num_parallel_calls=tf.data.AUTOTUNE ) if shuffle: ds = ds.shuffle(buffer_size=2000, seed=42) return ds.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
# Three-layer LSTM with Dropout regularisation model = Sequential([ Input(shape=(WINDOW_SIZE, train_scaled.shape[1])), LSTM(128, return_sequences=True), Dropout(0.2), LSTM(64, return_sequences=True), Dropout(0.2), LSTM(32, return_sequences=False), Dropout(0.1), Dense(1) ]) model.compile( loss='mse', optimizer=tf.keras.optimizers.AdamW( learning_rate=0.0005, weight_decay=1e-4 ), metrics=['mae'] )
# Reindex → forward-fill → trim → engineer features full_index = pd.date_range( start=df.index.min(), end=df.index.max(), freq='60s' ) df = df.reindex(full_index).ffill().dropna() df = df[df.index >= '2017-01-01'] # Cyclic time encoding hour = df.index.hour + df.index.minute / 60.0 df['hour_sin'] = np.sin(2 * np.pi * hour / 24) df['hour_cos'] = np.cos(2 * np.pi * hour / 24) # Fit scaler on train only - no leakage scaler = MinMaxScaler() train_scaled = scaler.fit_transform(train_df.values) val_scaled = scaler.transform(val_df.values)