Help needed: Transformer for candlestick prediction not learning

Venon · September 19, 2025, 8:09am

Hello everyone,

I’m working on a project where I try to predict candlestick features (open, close, high, low) using a Transformer architecture in Keras.
I am not an expert in Transformers, so I’m looking for advice and best practices from the community to improve the performance of my model.

Problem

The model trains without runtime errors and the MSE looks reasonable, but the directional accuracy (whether the candle closes higher or lower than it opened) stay at the 50% (random). Despite trying different hyperparameters and loss weightings, the model fails to learn stable directional predictions.

Data

The data is quite large, and I train on a GPU A100 80GB (cluster). Shapes:

inputs_train.shape  = (2603666, 40, 11)
inputs_val.shape    = (139233,   40, 11)
inputs_test.shape   = (41786,    40, 11)

outputs_train.shape = (2603666, 1, 11)
outputs_val.shape   = (139233,   1, 11)
outputs_test.shape  = (41786,    1, 11)

outputs_mask_train.shape = (2603666, 1)
outputs_mask_val.shape   = (139233,   1)
outputs_mask_test.shape  = (41786,    1)

So: sequences of various length (max 40 in this test) with 11 features (open, high, low, close, rsi, macs, boolinger) → predicting next candle (11 features) with masking for padding.

Model (default hyperparameters)

num_layers = 8
d_model = 512
num_heads = 8
dff = 1024
dropout_rate = 0.1

batch_size = 32
epochs = 50
es_patience = 10

warmup_rate = 0.01
num_steps = 81364
total_steps = 4068200
warmup_steps = 40682

penalty_direction_weight = 1.0
penalty_open_weight      = 0.5
penalty_close_weight     = 0.5
penalty_size_weight      = 0.5
penalty_body_weight      = 0.5

I use a custom loss combining MSE with penalties for directional mistakes, open/close/size/body differences.

I tried different parameters and code corrections but nothing worked..

Goal

Make the Transformer architecture learn correctly.
Find the best features, parameters
A global model vs market/time_frame specialise model
etc

Request

I’d like to find helpers/contributors who want to:

Dive into the model code with me
Experiment with different architectures/loss functions
Share ideas from research or similar projects
open-source the work

The repo: GitHub - Venon282/candlesticks_predictions: The goal is to predict the n next candlesticks based on the m previous one. The method use is the Transformers.

I can share more code details, and even provide the dataset, to anyone who wants to participate.

If you’re curious about applying Transformers to trading/time-series data and want to collaborate, please reply here or contact me directly.

Thanks a lot, and looking forward to working with some of you!

Divya_Sree_Kayyuri · February 5, 2026, 9:26am

Hi @Venon, To improve directional accuracy, switch your absolute price targets with log returns to ensure stationarity and reduce the model complexity to avoid overfitting. Thanks!

Topic		Replies	Views
What is the model suitable for time series forecasting? General Discussion help_request , transformers	2	630	October 13, 2023
Tensorflow: time series forecasting TensorFlow	1	352	September 4, 2023
New small project need tf dev General Discussion tfdf , help_request	4	303	January 12, 2024
Adding a transformer layer Keras models , keras	3	902	June 15, 2023
Hey guys I do need you! - For some reason, my model is not satisfying the test cases General Discussion models , help_request	2	1009	May 20, 2024