Hi.
I had POC project with terraform and neural network.
Before describe technical parameters, few words about data.
This is a table of some items.
One item has two properties.
In some array could be present from 5 up to 40 items.
The goal for neural network is choose one item from others.
The table in csv looks like:
ID, Item_1,Item_1_Property_1,Item_1_Property_2, … Item_40,Item_40_Property_1,Item_40_Property_2, Number_Of_Item_Which_Need_To_Choose
This Item placed in DB. And I put in csv only index from db table.
I read that it is not good practice to use big numbers, instead of this table should have binary columns (one hot encoding with panda).
But during POC I faced that export from db to csv took too much time to create CSV with binary columns.
Even If I create, It will be impossible to upload this amount of data to RAM, when panda will read it.
I decided to use id from db with such configuration of layers:
output = 40 # Number of output neurons input = 121 # Number of input neurons model = Sequential() model.add(Dense(input, input_dim=input, activation='relu')) model.add(Dense(input, activation='relu'))
model.add(Dense(input, activation=‘sigmoid’))
model.add(Dense(input, activation=‘sigmoid’))
model.add(Dense(output, activation=‘sigmoid’))
model.summary()
This model studying on google colab GPU about 7 hours.
After studying this model, it becomes to predict data with 20-30% accuracy, and we decided to put to neural network real data.
When I start learn on big real data, nn show:
- in 1 day it made Loss 2,6 Accuracy 0,15
- in 2 days it made Loss 2,1 Accuracy 0,26
- in 3 days it made Loss 1,7 Accuracy 0,33
- in 4 days it made Loss 1,6 Accuracy 0,41
During 5-th day it makes no more than 0,4152
I understood that I have never reached at least Accuracy 0,9 as was in test data.
In that case, I want to ask your advice. What should I to do?
My concerns are:
- I should not use id from db. Better one hot encoding.
- This is big data which required big servers with big amount of RAM.
- Use not single server. Instead, use spark + tensorflow.
- Any thing else?