Certain loss/metric functions like UMBRAE and MASE make use of a benchmark - typically the “naïve forecast” which is 1 period lag of the target.
However in my dataset, I’m using hourly data to train/predict monthly returns. So in essence my naïve forecast isn’t 1 row behind, it’s N rows behind where N can change over time, especially when dealing with monthly timeframes (some months are shorter/longer than others).
I already have a feature called bars_in_X
where X
is one of D, W, M, Y
respectively for each timeframe (though for the sake of argument, I’m only using M
). It’s an integer that references the 1-period-ago row wrt the timeframe. So for bars_in_D
, that would typically be 24 (as there are 24 Hours in 1 Day). i.e., the naïve forecast for the hourly value NOW happened 24 bars ago.
As a halfway measure, I find the mean of each of those features in the dataset and before creating the model I make custom loss functions that are supplied this value (see how here). This produces a usable, but technically incorrect result because it’s a static backreference as opposed to the dynamic bars_in_X
value. It would also be an insufficient method for when I eventually want to find the naïve forecast for ALL timeframes (not just one).
Does anyone have a suggested method of handling this kind of situation?