I have pre-processed echocardiograms videos into arrays with shape (128,112,112,1) each and am trying to classify the left ventricular function with supervised learning with a 3D convolutional network. A simple 2 conv3D layers with kernal size (1,3,3) and (3,1,1) respectively to separate spatial and temporal features and maxpooling3D leads to overfitting. This persists despite addition of L2 regularization and dropout layers.
Is anyone willing to share and collaborate?