I’m working on a Node.js API that I want to use TensorFlow.js with. I got it working on CPU, but I want it to utilize my 5080.
Here is my Dockerfile:
# Use TensorFlow GPU base image (includes CUDA/cuDNN)
FROM tensorflow/tensorflow:latest-gpu AS base
# Install TensorFlow with CUDA support
RUN pip install tensorflow[and-cuda]
WORKDIR /app
USER root
# Install Node.js 20
RUN apt-get update && \
apt-get install -y curl && \
curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \
apt-get install -y nodejs && \
apt-get clean
# Development stage
FROM base AS dev
COPY package*.json ./
ENV NODE_ENV=development
RUN npm install
COPY . ./
CMD ["npm", "run", "dev"]
# Production build stage
FROM base AS build
COPY package*.json ./
ENV NODE_ENV=production
RUN npm install --omit=dev
COPY . ./
CMD ["node", "src/index.js"]
And here is my docker-compose.
I get this error after docker-compose build --no-cache
and docker-compose up
:
[nodemon] 2.0.22
[nodemon] to restart at any time, enter `rs`
[nodemon] watching path(s): *.*
[nodemon] watching extensions: js,mjs,json
[nodemon] starting `node src/index.js`
2025-04-24 21:14:37.243278: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-24 21:14:37.245373: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2025-04-24 21:14:37.245400: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2025-04-24 21:14:37.265148: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-04-24 21:14:37.364324: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:02:00.0/numa_node
Your kernel may have been built without NUMA support.
2025-04-24 21:14:37.364398: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2025-04-24 21:14:37.364446: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2025-04-24 21:14:37.364478: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2025-04-24 21:14:37.364508: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2025-04-24 21:14:37.382903: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2025-04-24 21:14:37.383027: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
I have found many examples of this online, but none of the solutions seem to work for me.
Running this via Docker cmd works:
dev docker run -it --rm --runtime=nvidia --gpus all tensorflow/tensorflow:latest-gpu /bin/bash
pip install tensorflow[and-cuda]
It can see my GPU and all that jazz, just not with the image I built (which uses the TensorFlow base).
Any insight? I can elaborate more on what I have tried, but I will need to go back in my history.
UPDATE:
Just some more info. I can follow this guide for docker to run ollama and it works. So i think my system is set up correctly. It just comes down to getting tensorflow to work
docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
MapSMtoCores for SM 12.0 is undefined. Default to use 128 Cores/SM
MapSMtoArchName for SM 12.0 is undefined. Default to use Ampere
GPU Device 0: "Ampere" with compute capability 12.0
> Compute 12.0 CUDA device: [NVIDIA GeForce RTX 5080]
86016 bodies, total time for 10 iterations: 45.023 ms
= 1643.318 billion interactions per second
= 32866.351 single-precision GFLOP/s at 20 flops per interaction