Hi there!
I need to develop a NN for the classification of signal points and my question is how to prepare training data for that if I have the next data and structures.
I have thousands of signals on a hard drive in csv format, I can use them for training.
csv is read into a separate dataframe for each signal (input instance data). Points can be classified only within dataframe - to classify points you need to know all the signal (One point can’t be classified separately - only full set of data carries the information about possible point classes - e.g. there are probabilities of distributions of points of class 1 along the x-axis).
I have thousands of such separate dataframes - all points of them are labeled.
Each file has about 10 000 rows, and one row holds data for one point of a signal y(x):
i - number of current point,
x - coordinate value for current point i
y - value of a signal at x - y(x)
delta_y - error value for current point x
point_class - for simplification it’s a binary-type variable - 0/1 (label to train and classify)
(0 - for all points that are from class 0, and 1- for another class of points).
So for inputs - it’s possible to pass 30 000 values, and classify each of 10 000 points of input signal defining a class for each point if current signal - so we have 10 000 outputs.
The task for neural network is to get the arrays x, y, delta_y as input and classify each point of input signal (point_class column).
In this case output of a NN must have the same size as one input signal - 10 000 outputs to characterize 10 000 input points of the signal. All outputs can be interpreted as probability that point has class 1 - value can be 0…1.
So for training I have inputs : x[], y[], delta_y[
] - (10 000 elements each) -
(we are giving y(x), delta_y(x), x
to the input of neural network)
and it must calculate the outputs: - p[]: [0, 0, 0, 1, ... , 1, 0, 0]
- (10 000 elements), so it can be plotted like p(x).
- How to handle all the files into one dataframe or it is possible to create a training dataset just using separate input csv-s as and others as outputs - for each training case? (input1.csv → output1.csv)
Maybe someone can provide examples similar to this task?
-
Which layer types to use in this case as outputs?
-
Is it a good idea to have such a large network (input number
If any additional information is needed - I can provide anything for understanding and processing.