Deep Learning · Time Series · LSTM

BTC Price
Forecasting

A stacked LSTM neural network trained on 698,400 minutes of Coinbase BTC/USD data to predict closing price one minute into the future, achieving a mean absolute error of $38.44 on a $6,933 asset.

View on GitHub Try Replay Demo
TEST MAE $38.44 MEAN ERROR $31.16 TEST MSE 6.0e-06 WINDOW 1,440 STEPS TRAINING ROWS 698,400 ARCHITECTURE LSTM 128→64→32 OPTIMIZER AdamW ERROR RATE 0.55% TEST MAE $38.44 MEAN ERROR $31.16 TEST MSE 6.0e-06 WINDOW 1,440 STEPS TRAINING ROWS 698,400 ARCHITECTURE LSTM 128→64→32 OPTIMIZER AdamW ERROR RATE 0.55%

Performance

Final Results

Evaluated on held-out test data from June-August 2018, a period the model never encountered during training or validation.

Test MAE (USD)

$38.44

mean absolute error

Mean Error

$31.16

systematic bias

Error Rate

0.55%

of $6,933 avg price

Test MSE

6e-06

scaled mean sq. error

Std Error

$36.45

error consistency

Best Val Loss

1.07e-6

epoch 15

RunMAE (USD)Change made
Run 1$608.41Distribution shift during the 2018 crash
Run 2$91.792-layer LSTM with fixed splits and stable price regime
Run 3$32.683-layer LSTM + AdamW + ReduceLROnPlateau
Run 4 (final)$38.44Retrain after session loss with consistent result

Interactive

Historical Replay

Runs the actual trained LSTM model in your browser using TensorFlow.js on held-out 2018 test windows. Each replay hides the next minute, asks the model to predict it, then reveals the real candle from the test set.

BTC/USD · Held-Out Test Replay

Click replay to load model and test data

Model: LSTM 128→64→32
Window: 1,440 steps (24h)

Replay Current Close

...

Held-out test window

Predicted Next Minute

...

Model not run yet

Loads model + 2018 test artifact · Runs in browser

Loading model...

Replay complete

Current Close

...

latest 1-min candle

Predicted Next

...

1 minute ahead

Prediction Error

...

vs actual target

Actual Next

...

held-out target

TEST WINDOW + HIDDEN NEXT CANDLE

Visualisation

Training Results

The model tracks real BTC price movements with high fidelity across the June-August 2018 test window, including a sharp intra-period correction.

BTC/USD CLOSE PRICE: TEST SET SAMPLE
Actual
Predicted

TRAINING & VALIDATION LOSS: MSE LOG SCALE
Train
Validation

Process

Methodology

Each preprocessing decision is documented and justified by exploratory analysis conducted in Notebook 1 before any transformation was applied.

01 DATA AUDIT

Missing Data Analysis

109,069 rows (5.19%) have all values as NaN, inactive trading windows, not corrupted data. A further 58,354 timestamps are absent from the CSV entirely, discovered only after reindexing to a complete 60-second DatetimeIndex.

109,069 NaN rows58,354 missing timestampsforward-fill

02 TRIMMING

Remove Early Sparse Data

The longest consecutive gap spans 38.4 hours, concentrated in 2014-2016. Forward-filling across such gaps produces 2,303 identical rows, a flat line the model would learn as a genuine price signal. Trimming to 2017-01-01 eliminates all such artifacts.

38.4h max gaptrim to 2017artifact removal

03 FEATURES

Feature Engineering

Open, High, Low and Weighted_Price are all correlated ≥ 0.999 with Close, making them redundant. Volume_(BTC) is retained after log₁p transform as the only independent feature (corr = 0.15). Time is encoded as sin/cos cyclic features to preserve periodicity.

6 featureslog₁p volumecyclic encoding

04 SPLITTING

Chronological Split

All three splits fall within the 2017-2018 bull market regime to ensure consistent price distribution. No shuffling is applied because shuffling a time series leaks future price information into training.

train 2017-01 → 2018-05val May-Jun 2018test Jun-Aug 2018

05 SCALING

Leakage-Free Normalisation

MinMaxScaler is fitted exclusively on training data. Fitting on the full dataset would leak validation and test price ranges into training, letting the model indirectly know future price levels before seeing them.

MinMaxScalerfit on train onlyno leakage

06 PIPELINE

tf.data Streaming

tf.data.Dataset streams sliding windows with shift=5, one window every 5 minutes. Pre-computing 698,000 windows of shape (1440, 6) would require ~27GB of RAM. Streaming keeps memory usage under 2GB throughout training.

tf.datashift=5memory efficient

Model

Architecture

A three-layer stacked LSTM with Dropout regularisation, trained with AdamW and an adaptive learning rate schedule.

Input(1440, 6)
LSTM128 units · return_sequences=True
Dropout0.2
LSTM64 units · return_sequences=True
Dropout0.2
LSTM32 units · return_sequences=False
Dropout0.1
Dense1 · predicted Close (USD)


HyperparameterValueRationale
LossMSETask requirement
OptimizerAdamWAdam + weight decay reduces overfitting
Learning rate0.0005Precise convergence without oscillation
Weight decay1e-4L2 regularisation via AdamW
LR scheduleReduceLROnPlateauHalves LR when val_loss plateaus
Batch size32More gradient updates per epoch
Early stoppingpatience=7Restores best weights automatically
Window size1,440 steps24 hours of 1-minute data
Horizon1 stepPredict 1 minute ahead

Implementation

Key Code

Selected excerpts from the three notebooks. Full source available on GitHub.

tf.data Pipeline
Model
Preprocessing
# Streaming sliding windows - no RAM crash
def make_dataset(data, shuffle=False):
    total_length = WINDOW_SIZE + HORIZON
    ds = tf.data.Dataset.from_tensor_slices(data)
    ds = ds.window(total_length, shift=5, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(total_length, drop_remainder=True))
    ds = ds.map(
        lambda w: (w[:WINDOW_SIZE], w[WINDOW_SIZE + HORIZON - 1, CLOSE_IDX]),
        num_parallel_calls=tf.data.AUTOTUNE
    )
    if shuffle:
        ds = ds.shuffle(buffer_size=2000, seed=42)
    return ds.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
# Three-layer LSTM with Dropout regularisation
model = Sequential([
    Input(shape=(WINDOW_SIZE, train_scaled.shape[1])),
    LSTM(128, return_sequences=True),
    Dropout(0.2),
    LSTM(64, return_sequences=True),
    Dropout(0.2),
    LSTM(32, return_sequences=False),
    Dropout(0.1),
    Dense(1)
])
model.compile(
    loss='mse',
    optimizer=tf.keras.optimizers.AdamW(
        learning_rate=0.0005, weight_decay=1e-4
    ),
    metrics=['mae']
)
# Reindex → forward-fill → trim → engineer features
full_index = pd.date_range(
    start=df.index.min(), end=df.index.max(), freq='60s'
)
df = df.reindex(full_index).ffill().dropna()
df = df[df.index >= '2017-01-01']

# Cyclic time encoding
hour = df.index.hour + df.index.minute / 60.0
df['hour_sin'] = np.sin(2 * np.pi * hour / 24)
df['hour_cos'] = np.cos(2 * np.pi * hour / 24)

# Fit scaler on train only - no leakage
scaler = MinMaxScaler()
train_scaled = scaler.fit_transform(train_df.values)
val_scaled   = scaler.transform(val_df.values)