Help needed: Transformer for candlestick prediction not learning

Hello everyone,

I’m working on a project where I try to predict candlestick features (open, close, high, low) using a Transformer architecture in Keras.
I am not an expert in Transformers, so I’m looking for advice and best practices from the community to improve the performance of my model.

Problem

The model trains without runtime errors and the MSE looks reasonable, but the directional accuracy (whether the candle closes higher or lower than it opened) stay at the 50% (random). Despite trying different hyperparameters and loss weightings, the model fails to learn stable directional predictions.

Data

The data is quite large, and I train on a GPU A100 80GB (cluster). Shapes:

inputs_train.shape  = (2603666, 40, 11)
inputs_val.shape    = (139233,   40, 11)
inputs_test.shape   = (41786,    40, 11)

outputs_train.shape = (2603666, 1, 11)
outputs_val.shape   = (139233,   1, 11)
outputs_test.shape  = (41786,    1, 11)

outputs_mask_train.shape = (2603666, 1)
outputs_mask_val.shape   = (139233,   1)
outputs_mask_test.shape  = (41786,    1)

So: sequences of various length (max 40 in this test) with 11 features (open, high, low, close, rsi, macs, boolinger) → predicting next candle (11 features) with masking for padding.

Model (default hyperparameters)

num_layers = 8
d_model = 512
num_heads = 8
dff = 1024
dropout_rate = 0.1

batch_size = 32
epochs = 50
es_patience = 10

warmup_rate = 0.01
num_steps = 81364
total_steps = 4068200
warmup_steps = 40682

penalty_direction_weight = 1.0
penalty_open_weight      = 0.5
penalty_close_weight     = 0.5
penalty_size_weight      = 0.5
penalty_body_weight      = 0.5

I use a custom loss combining MSE with penalties for directional mistakes, open/close/size/body differences.

I tried different parameters and code corrections but nothing worked..

Goal

  1. Make the Transformer architecture learn correctly.
  2. Find the best features, parameters
  3. A global model vs market/time_frame specialise model
  4. etc

Request

I’d like to find helpers/contributors who want to:

  • Dive into the model code with me
  • Experiment with different architectures/loss functions
  • Share ideas from research or similar projects
  • open-source the work

The repo: GitHub - Venon282/candlesticks_predictions: The goal is to predict the n next candlesticks based on the m previous one. The method use is the Transformers.

I can share more code details, and even provide the dataset, to anyone who wants to participate.

If you’re curious about applying Transformers to trading/time-series data and want to collaborate, please reply here or contact me directly.

Thanks a lot, and looking forward to working with some of you!

Hi @Venon, To improve directional accuracy, switch your absolute price targets with log returns to ensure stationarity and reduce the model complexity to avoid overfitting. Thanks!