Neural network with tensorflow after growing data stopped studying

maxim · November 24, 2021, 1:50pm

Hi.
I had POC project with terraform and neural network.
Before describe technical parameters, few words about data.
This is a table of some items.
One item has two properties.
In some array could be present from 5 up to 40 items.
The goal for neural network is choose one item from others.
The table in csv looks like:
ID, Item_1,Item_1_Property_1,Item_1_Property_2, … Item_40,Item_40_Property_1,Item_40_Property_2, Number_Of_Item_Which_Need_To_Choose
This Item placed in DB. And I put in csv only index from db table.
I read that it is not good practice to use big numbers, instead of this table should have binary columns (one hot encoding with panda).
But during POC I faced that export from db to csv took too much time to create CSV with binary columns.
Even If I create, It will be impossible to upload this amount of data to RAM, when panda will read it.
I decided to use id from db with such configuration of layers:

output = 40 # Number of output neurons
input = 121 # Number of input neurons
model = Sequential()
model.add(Dense(input, input_dim=input, activation='relu'))
model.add(Dense(input, activation='relu'))
model.add(Dense(input, activation=‘sigmoid’))
model.add(Dense(input, activation=‘sigmoid’))
model.add(Dense(output, activation=‘sigmoid’))
model.summary()
This model studying on google colab GPU about 7 hours.

After studying this model, it becomes to predict data with 20-30% accuracy, and we decided to put to neural network real data.
When I start learn on big real data, nn show:

in 1 day it made Loss 2,6 Accuracy 0,15
in 2 days it made Loss 2,1 Accuracy 0,26
in 3 days it made Loss 1,7 Accuracy 0,33
in 4 days it made Loss 1,6 Accuracy 0,41

During 5-th day it makes no more than 0,4152
I understood that I have never reached at least Accuracy 0,9 as was in test data.
In that case, I want to ask your advice. What should I to do?

My concerns are:

I should not use id from db. Better one hot encoding.
This is big data which required big servers with big amount of RAM.
Use not single server. Instead, use spark + tensorflow.
Any thing else?

Ekaterina_Dranitsyna · November 25, 2021, 8:26am

To avoid loading all the data into memory at once you can try to use tf.keras.utils.Sequence and create a data generator. It will load the data into RAM in batches as the training goes.
Here is an example of how it is done with images (Image segmentation with a U-Net-like architecture) but you can rewrite it to query the data base and transform the original sparse data into OHE vectors for 40 items just for the necessary batch.
To increase the accuracy you can experiment with more complicated architectures. Here are examples for structured data: Structured Data

maxim · November 25, 2021, 10:25am

Thank you! Very helpful links!

Topic		Replies	Views
How improve neural network General Discussion models , learning , help_request	4	1261	September 9, 2021
How to improve accuracy of a CNN_LSTM binary classifier in TF 2.4 TensorFlow models , keras , help_request	2	2400	November 18, 2021
Fails when the dataframe has over 60 000 rows General Discussion help_request	4	693	January 12, 2023
Neural Network Architecture - Accuracy issue Keras datasets , help_request	2	953	June 3, 2022
How to do Minority class sampling using tensorflow? General Discussion tfdata , help_request	1	1120	June 13, 2021

Neural network with tensorflow after growing data stopped studying

Related topics