Transformer for asynchronous multi-stream image time-series with online prediction?

satadru · March 30, 2026, 12:20pm

I have two streams of images, each stream corresponding to a different “channel” (e.g. different sensor modality). The streams are not synchronized — at any given moment, a new image arrives from one stream or the other, each with a real-valued timestamp. I want to classify the sequence online, i.e. produce an updated prediction after every new incoming image.

Key constraints:

Spatial features within each image matter (not just a scalar summary)
Timestamps are irregular and not aligned across streams
Prediction must improve causally as more observations arrive

The natural design seems to be: ViT encoder per image → causal transformer over the merged token stream, with real-valued timestamp embeddings (e.g. Time2Vec) replacing positional indices, and band/channel ID as an additional embedding.

Is there an existing architecture or paper that handles this exact setup? Or is this a known gap?

Topic		Replies	Views
Transformers for image prediction General Discussion help_request	1	1097	August 4, 2022
Online Inference for LSTM General Discussion lstm , tfmodel	1	149	March 17, 2024
Help needed: Transformer for candlestick prediction not learning TensorFlow tf-train , tfkeras	1	132	February 5, 2026
Asynchronous Training of Deep Learning Models General Discussion models , tfimage , gpu , learning	0	216	March 26, 2024
Advanced Windowing Techniques General Discussion models , keras , help_request	2	403	September 7, 2022

Transformer for asynchronous multi-stream image time-series with online prediction?

Related topics