Building tensorflow from source for RTX5000 GPU series

DriaanJ · February 9, 2025, 7:14am

Nvidia’s new Blackwell GPUs need Cuda 12.8 and cuda compute capability 12.5 with sm_120 architecture. I’ve installed the latest tensorflow 12.8 with GPU support, but it doesn’t run with the Blackwell architecture.

So I tried to build from source with Cuda 12.8.0, cudnn 9.7.0, compute [7.5,8.9,12.5], clang-17 and bazel. With clang-17 the build process fails with a clang failed error: “clang: error: unsupported CUDA gpu architecture: sm_125”.

Looking inside .bazelrc in the cloned TF code base I don’t see support for sm_120:
“”"
#Select supported compute capabilities (supported graphics cards).
#This is the same as the official TensorFlow builds.
#See CUDA GPUs - Compute Capability | NVIDIA Developer
#compute_XY enables PTX embedding in addition to SASS. PTX
#is forward compatible beyond the current compute capability major
release while SASS is only forward compatible inside the current
#major release. Example: sm_80 kernels can run on sm_89 GPUs but
#not on sm_90 GPUs. compute_80 kernels though can also run on sm_90 GPUs.
build:cuda_clang --repo_env=HERMETIC_CUDA_COMPUTE_CAPABILITIES=“sm_60,sm_70,sm_80,sm_89,compute_90”

Is it possible to use bazel and clang-17 to build tensorflow to run with GPU support on the Blackwell sm_120 architecture?

If it is possible, any advice on what I can change? I did try using clang-18, but that gave other errors.

Disclaimer: This is my first time trying to build tensorflow from source.
Environment: venv is python 3.12.3 on a VMware Ubuntu 24.04 virtual machine.

Vadym_Matsishevskyi · February 19, 2025, 11:21pm

Have you tried adding sm_125 into that .bazelrc config (build:cuda_clang --repo_env=HERMETIC_CUDA_COMPUTE_CAPABILITIES=“sm_60,sm_70,sm_80,sm_89,sm_125,compute_90”)?

DriaanJ · February 20, 2025, 12:30pm

I did not consider this because I expected this line is included in .bazelrc for a reason, i.e. to prevent something not working if the compute capability is something other than the ones specified.

Nonetheless, I tried your suggestion and it doesn’t work.

I then checked further and there is more checks and balances inside configure.py that makes sure compute capability only falls within the ‘approved’ range. line 1003: m = re.match(‘[0-9]+.[0-9]+’, compute_capability).

I’ve changed this now to be m = re.match(‘[0-12]+.[0-9]+’, compute_capability) to enable the config to run, but surely doing these overrides is bound to break something somewhere? Surely these checks are there for a reason? I can’t imagine the build is going to work just by overriding the build compute capability checks?

DriaanJ · February 20, 2025, 1:35pm

Still receiving errors after those changes.
Compiling xla/service/gpu/make_batch_pointers.cu.cc failed: (Exit 1): clang failed: error executing command (from target @local_xla//xla/service/gpu:make_batch_pointers_kernel) /usr/lib/llvm-17/bin/clang -MD -MF bazel-out/k8-opt/bin/external/local_xla/xla/service/gpu/_objs/make_batch_pointers_kernel/make_batch_pointers.cu.pic.d … (remaining 172 arguments skipped)
clang: error: unsupported CUDA gpu architecture: sm_120

affectsai · March 10, 2025, 12:09am

I’m struggling through this right now too. Make any progress?

Are you sure about that compute capability? The GeForce RTX 5000’s are listed at compute capability 10.0 (sm_100) CUDA GPUs - Compute Capability | NVIDIA Developer

But even using sm_100, I’m stuck on the compilation errors.

For what it’s worth, I’m using clang-19. Building TF2.18.0 for sm_89 using clang-19 requires an additional two command line option passed to Bazel:

–copt=-Wno-error=c23-extensions
–cxxopt=“-D_GLIBCXX_USE_CXX11_ABI=0”

That works great against CUDA 12.6.0. No such luck on 12.8.0 yet.

Also, according to their release notes, CUDA 12.8.1 has Blackwell support, not 12.8.0, but even tf-nightly doesn’t have 12.8.1 in the hermetic version list yet so I think it’s going to be an uphill struggle for a while.

DriaanJ · March 10, 2025, 7:26am

Hi there affectsai. Nope, I never got it working. And I’m glad to see Nvidia has now updated their GPU Compute Capability website! When I was trying to build it the RTX5000 series cards’ info wasn’t loaded yet. I haven’t saved the exact website I got the sm_120 or 12.5 compute capability from, but clearly those references were wrong. I do recall seeing a table on a site that showed the 5080 with compute capability of 12.5, and other sites that referred to it having sm_120 architecture. That in itself was confusing to me at the time - why 12.5 yet sm_120? With hindsight now, clearly they didn’t know what they were talking about. CUDA GPUs - Compute Capability on Nvidia’s site is definitely the right place to get the compute capability, so I’ll change to sm_100 and try again, thank you!

But if you say CUDA 12.8.0 doesn’t have Blackwell support, only CUDA 12.8.1, then I think I’m going to wait some longer…

Did you see Software Migration Guide for NVIDIA Blackwell RTX GPUs: A Guide to CUDA 12.8, PyTorch, TensorRT, and Llama.cpp - AI & Data Science - NVIDIA Developer Forums ? or TensorFlow Release 25.01 - NVIDIA Docs ? In the Docker container 25.1 from Nvidia at the time they were using CUDA 12.8.0. In the latest container version 25.2 it looks like they using CUDA 12.8.0.38, so still not 12.8.1.

I think for now it is easier to just use one of the Docker containers and once things become more clear I’ll try to build from source again. Like you say, I also think it is going to be an uphill struggle right now, and definitely above my current knowledge level on how to get it to work.

affectsai · March 11, 2025, 8:01pm

I think I was misreading the CUDA version labels. 12.8 Update 1 has Blackwell support but this appears to be different than 12.8.1, which is also available. I am currently using CUDA 12.8.0 in the nvidia optimized tensorflow container v25.02, and see sm_100 listed in the output of nvcc -code-ls

That container has TF2.17.1 though, and is officially deprecated according to its release notes, so we won’t ever see one with TF2.18.

TF’s r2.19 branch has the beginnings of support for hermetic CUDA 12.8.0 — it’s listed in the repository json, anyway, but it appears you need to compile using NVCC instead of Clang for CUDA 12.8.0. Clang, according to their docs, only supports up to CUDA 12.1 — so even if you get past the mess of compiler options that make unsupported CUDA levels an error in clang instead of a warning — you’re not going to get Blackwell compute capabilities out of it. Just a forward-compatible back-level build.

Getting r2.19 to build with NVCC using hermetic CUDA 12.8.0 / CUDNN 9.7.0, and gcc-13, was easy enough (not intuitive at all, but easy enough…) — but alas — the XLA code is not compatible with the level of cuBLAS packaged with them… there’s API changes in cuBLAS that break the build. I can get a successful TF2.19 build against CUDA12.8.0, but you have to disable XLA to do it, and at that point, I’m really not sure what the point is.

Long story short - seeing hermetic CUDA 12.8.0 and hermetic CUDNN 9.7.0 in the r2.19 repository gives me hope that we’ll see it soon, but they’ve already cut the 2.19 release candidate, so I think we’re going to have to wait for 2.20.

I’ve been doing most of my work in Keras3 lately, and I hear PyTorch has CUDA 12.8.0 in their nightly builds since late January (according to Dr Google’s AI summary, so take that with a grain of salt). Maybe I’ll look into trying that out in the meantime, though, Keras3 should make that an pretty easy transition.

DriaanJ · March 12, 2025, 7:37am

That’s what I have done. I’ve been using Keras3 with PyTorch within a WSL environment since beginning Feb (when I was unable to get the build from source working). I’ve also noticed that they have now released a nightly PyTorch build with RTX50xx support for Windows, but my WSL environment is all set up already so not going to switch to that soon.

Thank you for pointing out that the container only has TF2.17.1 installed. I overlooked that…

So it looks like I’ll stick to the nightly PyTorch for now then.

One thing that isn’t working in Keras with PyTorch is stateful LSTM networks. I picked that up “Modified by moderator” and worked on a solution that has now been approved. If you need that, just follow the colab link and copy the new LSTM class he created for use in your project before the change is released Fix PyTorch stateful RNN/LSTM gradient computation error resolves #20875 by praveenhosdrug123 · Pull Request #20916 · keras-team/keras · GitHub

Topic		Replies	Views
Bug: Building tensorflow 2.15/2.16 from source is not possible : Missing tensorrt General Discussion python , tensorrt	1	402	August 26, 2024
Building from source General Discussion gpu , help_request	7	1300	December 17, 2023
Build tensorflow with GPUS support from source General Discussion build , gpu , ubuntu	3	1267	May 22, 2024
How am i supposed to add the compiler flags to TensorFlow when compiling from binary? TensorFlow build , tensorflow	1	140	October 4, 2024
TensorFlow 25.01 + CUDA 12.8 + RTX 5090 on WSL2: "CUDA failed to initialize" (Error 500) Issue TensorFlow build-and-install	2	950	March 18, 2025

Building tensorflow from source for RTX5000 GPU series

Related topics