Using Tensorflow profiler in Reinforcement Learning context

Hello together,
I’m fairly new to using the tensorflow api’s instead of the keras ones and have a problem using the tensorflow profiler in the context of an reinforcemnet learning project. Because the project structure doesn’t allow me to just start profiler, put all model code beneath this line and after the model code stop the profiler. I didn’t find any example I could use.

Some facts about the code structure:

The agent interacts with an simulation running as seperate service through an api. Because of this there is no traditional rl train loop. The program is used through an cli. When the app is started there are classes for handling the simulation startup coordination, for interaction between the rl algorithm and the simulation and for the rl algorithm. The rl algorithm class contains one or more agents. These agent contain one or two neural networks.
After startup the simulation and the algorithm class communicate through api calls. So instead of the “choose action, perform action, observe env” code in the rl loop there is “start simulation, send first state, sim wait, algorithm trigger agent/agents to choose action, send action/actions to sim, sim do sim step, sim send new state to algorithm instance, algorithm trigger agent/agents to choose action”. Some of these agents also dont do batch training, instead they do temporal difference updates, which in the extremest case means that they train after each step without an train loop.

Because of performance issues while training the neural nets I wanted to use the tf profiler.
Can I use the profiler in this context?

Where do I have to start the server?

I tried to start the profiler v2 server in the algorithm class. This class contains agent/agents instance which hold the nn models. Inside this agent instance I tried to collect traces and profiles and send them to the previously started server. This lead to errors.
I tried to start the server before the training step, but a single training step is so short (in regard of time) that there is not enough time to caputer an trace or profile before the server is shut down again.
Are their some rules that the server must be started in the same file/function as the code it should profile?

I would be very happy if someone could help me or point me to some ressources to better understand the profiler. I already tried to dig into the profiler code on github but I’m not good at reading and understanding c++.
If my explanation oft the software structure is not understandable just telle me and I’ll try to draw an diagram or something.

Thanks a lot!