Hello everyone,
I’m working on a project where I try to predict candlestick features (open, close, high, low) using a Transformer architecture in Keras.
I am not an expert in Transformers, so I’m looking for advice and best practices from the community to improve the performance of my model.
Problem
The model trains without runtime errors and the MSE looks reasonable, but the directional accuracy (whether the candle closes higher or lower than it opened) stay at the 50% (random). Despite trying different hyperparameters and loss weightings, the model fails to learn stable directional predictions.
Data
The data is quite large, and I train on a GPU A100 80GB (cluster). Shapes:
inputs_train.shape = (2603666, 40, 11)
inputs_val.shape = (139233, 40, 11)
inputs_test.shape = (41786, 40, 11)
outputs_train.shape = (2603666, 1, 11)
outputs_val.shape = (139233, 1, 11)
outputs_test.shape = (41786, 1, 11)
outputs_mask_train.shape = (2603666, 1)
outputs_mask_val.shape = (139233, 1)
outputs_mask_test.shape = (41786, 1)
So: sequences of various length (max 40 in this test) with 11 features (open, high, low, close, rsi, macs, boolinger) → predicting next candle (11 features) with masking for padding.
Model (default hyperparameters)
num_layers = 8
d_model = 512
num_heads = 8
dff = 1024
dropout_rate = 0.1
batch_size = 32
epochs = 50
es_patience = 10
warmup_rate = 0.01
num_steps = 81364
total_steps = 4068200
warmup_steps = 40682
penalty_direction_weight = 1.0
penalty_open_weight = 0.5
penalty_close_weight = 0.5
penalty_size_weight = 0.5
penalty_body_weight = 0.5
I use a custom loss combining MSE with penalties for directional mistakes, open/close/size/body differences.
I tried different parameters and code corrections but nothing worked..
Goal
- Make the Transformer architecture learn correctly.
- Find the best features, parameters
- A global model vs market/time_frame specialise model
- etc
Request
I’d like to find helpers/contributors who want to:
- Dive into the model code with me
- Experiment with different architectures/loss functions
- Share ideas from research or similar projects
- open-source the work
I can share more code details, and even provide the dataset, to anyone who wants to participate.
If you’re curious about applying Transformers to trading/time-series data and want to collaborate, please reply here or contact me directly.
Thanks a lot, and looking forward to working with some of you!