I am trying to train a simple agent to play Mountain_Car from openai gym, but the training is getting into OOM, because the usage of RAM is getting to much. I wonder how much RAM i should expect to be needed, as even with 64GB it was not enough.
Hi @mock789 ,
The amount of RAM needed to train a simple DQN depends on a number of factors, including the size of the state space, the size of the action space, and the number of parameters in the model. However, as a general rule of thumb, you will need at least 4GB of RAM to train a simple DQN. If you are using a large state space or a large action space, you may need more RAM.
Training a simple DQN for the MountainCar environment should generally not require an excessive amount of RAM. With 64GB of RAM, you should have more than enough memory available for training.
If you are encountering out-of-memory (OOM) issues even with 64GB of RAM, it is possible that there is a memory leak or inefficient memory usage in your code. Make sure to deallocate unnecessary variables, release memory after use, and avoid any memory leaks.
You can try the following approaches to reduce memory usage during training:
Use a smaller replay buffer or limit its maximum size.
Decrease the batch size used for training.
Optimize the neural network architecture to reduce the number of parameters.
Use memory-efficient data types, such as float32 instead of float64.
Consider using techniques like frame skipping or state downsampling to reduce the dimensionality of observations.
I hope this helps!
Thanks.
Hi Laxma,
thanks very much for your response!!! That’s strange because i already tried all these things. I am using this script following below and at the moment by using the memory_profiler library, i already figured out, that this line of code alone
action=np.argmax(self.trainNetwork.predict(state)[0])
is responsible for a lot of RAM usage. The script looks like super basic and even if i reduced it, so only the above line of code will be called frequently, it is already running out of memory.
Do you maybe have any script for an openai gym example which i could use, to see if i will run into the same problem.
Btw. this is the script which i run at the time:
import gym
from keras import models
from keras import layers
from keras.optimizers import Adam
from collections import deque
import random
import numpy as np
class MountainCarTrain:
def init(self,env):
self.env=env
self.gamma=0.99
self.epsilon = 1
self.epsilon_decay = 0.05
self.epsilon_min=0.01
self.learingRate=0.001
self.replayBuffer=deque(maxlen=20000)
self.trainNetwork=self.createNetwork()
self.episodeNum=400
self.iterationNum=201 #max is 200
self.numPickFromBuffer=32
self.targetNetwork=self.createNetwork()
self.targetNetwork.set_weights(self.trainNetwork.get_weights())
def createNetwork(self):
model = models.Sequential()
state_shape = self.env.observation_space.shape
model.add(layers.Dense(24, activation='relu', input_shape=state_shape))
model.add(layers.Dense(48, activation='relu'))
model.add(layers.Dense(self.env.action_space.n,activation='linear'))
# model.compile(optimizer=optimizers.RMSprop(lr=self.learingRate), loss=losses.mean_squared_error)
model.compile(loss='mse', optimizer=Adam(lr=self.learingRate))
return model
def getBestAction(self,state):
self.epsilon = max(self.epsilon_min, self.epsilon)
if np.random.rand(1) < self.epsilon:
action = np.random.randint(0, 3)
else:
action=np.argmax(self.trainNetwork.predict(state)[0])
return action
def trainFromBuffer(self):
if len(self.replayBuffer) < self.numPickFromBuffer:
return
samples = random.sample(self.replayBuffer,self.numPickFromBuffer)
states = []
newStates=[]
for sample in samples:
state, action, reward, new_state, done = sample
states.append(state)
newStates.append(new_state)
newArray = np.array(states)
states = newArray.reshape(self.numPickFromBuffer, 2)
newArray2 = np.array(newStates)
newStates = newArray2.reshape(self.numPickFromBuffer, 2)
targets = self.trainNetwork.predict(states)
new_state_targets=self.targetNetwork.predict(newStates)
i=0
for sample in samples:
state, action, reward, new_state, done = sample
target = targets[i]
if done:
target[action] = reward
else:
Q_future = max(new_state_targets[i])
target[action] = reward + Q_future * self.gamma
i+=1
self.trainNetwork.fit(states, targets, epochs=1, verbose=0)
def orginalTry(self,currentState,eps):
rewardSum = 0
max_position=-99
for i in range(self.iterationNum):
bestAction = self.getBestAction(currentState)
#show the animation every 50 eps
if eps%50==0:
env.render()
new_state, reward, done, info, _ = env.step(bestAction)
new_state = new_state.reshape(1, 2)
# # Keep track of max position
if new_state[0][0] > max_position:
max_position = new_state[0][0]
# # Adjust reward for task completion
if new_state[0][0] >= 0.5:
reward += 10
self.replayBuffer.append([currentState, bestAction, reward, new_state, done])
#Or you can use self.trainFromBuffer_Boost(), it is a matrix wise version for boosting
self.trainFromBuffer()
rewardSum += reward
currentState = new_state
if done:
break
if i >= 199:
print("Failed to finish task in epsoide {}".format(eps))
else:
print("Success in epsoide {}, used {} iterations!".format(eps, i))
self.trainNetwork.save('./trainNetworkInEPS{}.h5'.format(eps))
#Sync
self.targetNetwork.set_weights(self.trainNetwork.get_weights())
print("now epsilon is {}, the reward is {} maxPosition is {}".format(max(self.epsilon_min, self.epsilon), rewardSum,max_position))
self.epsilon -= self.epsilon_decay
def start(self):
for eps in range(self.episodeNum):
currentState=env.reset()[0].reshape(1,2)
self.orginalTry(currentState, eps)
env = gym.make(‘MountainCar-v0’)
dqn=MountainCarTrain(env=env)
dqn.start()
Hi @mock789 ,
Can you please try incorporating this modification into your code and observe if it helps in mitigating the memory issues you are facing:
By directly assigning the result of self.trainNetwork.predict(state)
to pred
and then accessing the maximum index, we can avoid unnecessary memory allocations.
I hope this helps!
Thanks
Hi Laxma,
thank you very much for your reply!!! I am letting the script run at the moment, but i already have the impression that it will not fix the problem finally. I am using the memory profiler and i can see that this line of code
pred = self.trainNetwork.predict(state)
is incrementing the RAM about 0.2 MiB everytime it is called. It is really weird and i still see in the task manager how RAM for the Vmmem is growing steadily.
I really have no clue why tensor is not giving this RAM back to the system ://
Line # Mem usage Increment Occurrences Line Contents
44 891.2 MiB 891.2 MiB 1 @profile
45 def getBestAction(self,state):
46 891.2 MiB 0.0 MiB 1 self.epsilon = max(self.epsilon_min, self.epsilon)
47 891.2 MiB 0.0 MiB 1 if np.random.rand(1) < self.epsilon:
48 action = np.random.randint(0, 3)
49 else:
50 891.4 MiB 0.2 MiB 1 pred = self.trainNetwork.predict(state)
51 891.4 MiB 0.0 MiB 1 action = np.argmax(pred[0])
52 891.4 MiB 0.0 MiB 1 gc.collect()
53 891.4 MiB 0.0 MiB 1 return action
Hi @mock789,
Please give this modification a try and see if it helps in reducing the memory growth.
tf.keras.backend.clear_session()
instead of gc.collect()
. This will release the memory associated with the graph and start a clean session for the next prediction.
I hope this last try can resolve your issue.
Thanks.
Thank you Laxma,
this really seems to mitigate the problem.
Line # Mem usage Increment Occurrences Line Contents
44 456.6 MiB 456.6 MiB 1 @profile
45 def getBestAction(self,state):
46 456.6 MiB 0.0 MiB 1 self.epsilon = max(self.epsilon_min, self.epsilon)
47 456.6 MiB 0.0 MiB 1 if np.random.rand(1) < self.epsilon:
48 action = np.random.randint(0, 3)
49 else:
50 458.7 MiB 2.1 MiB 1 pred = self.trainNetwork.predict(state)
51 458.7 MiB 0.0 MiB 1 action = np.argmax(pred[0])
52 456.8 MiB -2.0 MiB 1 tf.keras.backend.clear_session()
53 456.8 MiB 0.0 MiB 1 return action
Strangely it does not always work as you can see here, where tf.keras.backend.clear_session() is not cleaning up the increment caused by pred = self.trainNetwork.predict(state) :
Line # Mem usage Increment Occurrences Line Contents
44 456.8 MiB 456.8 MiB 1 @profile
45 def getBestAction(self,state):
46 456.8 MiB 0.0 MiB 1 self.epsilon = max(self.epsilon_min, self.epsilon)
47 456.8 MiB 0.0 MiB 1 if np.random.rand(1) < self.epsilon:
48 action = np.random.randint(0, 3)
49 else:
50 457.2 MiB 0.4 MiB 1 pred = self.trainNetwork.predict(state)
51 457.2 MiB 0.0 MiB 1 action = np.argmax(pred[0])
52 457.2 MiB 0.0 MiB 1 tf.keras.backend.clear_session()
53 457.2 MiB 0.0 MiB 1 return action
So the usage of RAM is still increasing, but much slower.
Anyway, thank you a lot!!! I could run the reduced script the first time over 400 episodes without getting OOM
Now i will try how it works when i try to run the complete script!
Have a nice day
Hi Laxma,
just wanted to give an update, that even with the complete script i can now run more than 500 episodes without running into OOM. I really owe you a beer/tea/coca cola
Best
Hi @mock789,
That’s great news! I’m glad to hear that the complete script is working for you. I’m happy to help in any way that I can.
Happy coding!
Thanks.
Hello @mock789 ,
in the first script above, I’ve noticed
self.trainNetwork=self.createNetwork()
When calling predict, basically a new Sequential Model is returned every time?
action=np.argmax(self.trainNetwork.predict(state)[0])
If you like, you can also customize your train steps (very handy for RL).
Feel free to have a look at the Tensorflow DQN Libary (e.g: compare/benchmark the RAM usage).
Lucky Rewards,
Dennis