GPU access from TF accessed through the C API on a Linux mainframe

I have a program - actually written in Pascal - accessing Tensorflow through the C API. I have the program linked to libtensorflow.so and it works as it should on a computer without GPU. I tried the same on a high performance computer with GPU and something is different (it is actually a small partition with 1 GPU, but I tried on a large GPU partition and it is the same). Somehow it notices the GPU but complains that some libraries are missing, but I do not find what libraries.
I read a lot on this forum and elsewhere, but most of the answears, examples are using Python examples, something I do not use.
So, here are the details.
The Linux:

|LSB Version:|:core-4.1-amd64:core-4.1-noarch|
|---|---|
|Distributor ID:|RedHatEnterprise|
|Description:|Red Hat Enterprise Linux release 8.6 (Ootpa)|
|Release:|8.6|
|Codename:|Ootpa|

The nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

The nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro RTX 6000                 Off| 00000000:A3:00.0 Off |                  Off |
| 33%   26C    P8                8W / 260W|    671MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

The first error I get is I guess related to the fact that the TF version is optimized for GPU, but as no GPU is found, but the CPU could do more, it compains a bit, but as I understand it is OK. As I want to use the GPU, this should not be a problem. This error is only showed once when the program is started:

2024-12-06 13:14:57.084465: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR

Then every time a graph is created (or a session, I did not check), I get the following error. This is the one that bothers me, and what I would ask for help:

W0000 00:00:1733487298.477811  174015 gpu_device.cc:2344] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices..

I also checked internally from within the program with TF_SessionListDevices and indeed it can only see the CPU:
Device name, Device type, Device memory
/job:localhost/replica:0/task:0/device:CPU:0 CPU 268435456

Needless to say, that I do not have root access, so I need a solution without it.

Any help welcome. Thanks in advance,

Zsolt

It seems I found this error. With the linux strace command I checked what libraries tensorflow is looking for and the missing one in my case was libcudnn.so.9, what I manually downloaded and put into a folder where it was accessed, and immediately the error is gone.

Naturally - should somebody find this post once - others might have other missing files, but strace is a great tool for this.

That I still do not understand though that if this error message is printed (in gpu_device.cc line 2340) why the actually missing library is not printed. It could save hell a lot of time.

Still I cannot use the GPU as I get an all devices are busy error. I read a lot about it and TF 2.18, so I still check, but that is another story.