I’ve created several models with different architectures (Dense layers, CNN, combinations) for binary classification, and many of them train and perform well on train, validate, test datasets. When it comes to classifying a window of a signal, they all fail miserably, and classify pretty much every window as positive. I also tried an autoencoder anomaly detector, but false positives and false negatives were very high (50%).
I realize this is a very general description, but can provide more detail, and would greatly appreciate any advice.