2024-12-01 11:27:55.010584: F tensorflow/stream_executor/cuda/cuda_driver.cc:147] Failed setting context: CUDA_ERROR_NOT_PERMITTED: operation not permitted

Zhou_Wu · December 2, 2024, 9:05am

Excuse me, why is there this error here

D:\Anaconda3\envs\tf_st\python.exe D:\Study\tfst\QuantizationAwareDeepOptics-main\main.py --task=hyperspectral --doe_material=SK1300 --quantization_level=2 --scene_depth_m=1 --tag=000A4LevelHyperspctralTraining --sensor_distance_mm=50 
2024-12-01 11:27:51.554149: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

 
 GPU(s) to be used: 

[LogicalDevice(name='/device:GPU:0', device_type='GPU')]
Task:  tasks.hyperspectral
Start TASK tasks.hyperspectral
Extra Arguments:  {'doe_material': 'SK1300', 'quantization_level': 2, 'quantize_at_test_only': False, 'alpha_blending': False, 'adaptive_quantization': False, 'checkpoint': None, 'continue_training': False, 'tag': '000A4LevelHyperspctralTraining', 'sensor_distance_mm': 50, 'scene_depth_m': 1}
2024-12-01 11:27:51.985576: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:222] Using CUDA malloc Async allocator for GPU: 0
2024-12-01 11:27:51.985689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5449 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9
2024-12-01 11:27:52 [INFO] Wavelength list used： [4.0e-07 4.1e-07 4.2e-07 4.3e-07 4.4e-07 4.5e-07 4.6e-07 4.7e-07 4.8e-07
 4.9e-07 5.0e-07 5.1e-07 5.2e-07 5.3e-07 5.4e-07 5.5e-07 5.6e-07 5.7e-07
 5.8e-07 5.9e-07 6.0e-07 6.1e-07 6.2e-07 6.3e-07 6.4e-07 6.5e-07 6.6e-07
 6.7e-07 6.8e-07 6.9e-07 7.0e-07]
Network Args:
activation= elu
final_activation= sigmoid
kernel_initializer= he_uniform
final_kernel_initializer= glorot_uniform
2024-12-01 11:27:52 [INFO] 

==============>DOE Args<===============
  > General:
 {'wave_length_list': array([4.0e-07, 4.1e-07, 4.2e-07, 4.3e-07, 4.4e-07, 4.5e-07, 4.6e-07,
       4.7e-07, 4.8e-07, 4.9e-07, 5.0e-07, 5.1e-07, 5.2e-07, 5.3e-07,
       5.4e-07, 5.5e-07, 5.6e-07, 5.7e-07, 5.8e-07, 5.9e-07, 6.0e-07,
       6.1e-07, 6.2e-07, 6.3e-07, 6.4e-07, 6.5e-07, 6.6e-07, 6.7e-07,
       6.8e-07, 6.9e-07, 7.0e-07]), 'wavelength_to_refractive_index_func': <function refractive_index_glass_ohara_sk1300 at 0x00000297000A6670>, 'height_map_initializer': None, 'height_tolerance': 2e-08} 
  > Extra:
 {'quantization_level_cnt': 2, 'quantize_at_test_only': False, 'adaptive_quantization': False, 'alpha_blending': False, 'step_per_epoch': 1672, 'alpha_blending_start_epoch': 5, 'alpha_blending_end_epoch': 40} ==============<DOE Args>===============


2024-12-01 11:27:52 [INFO] [DOE] Quantization base height: 1.536909583045e-06
2024-12-01 11:27:52 [INFO] [QDO] Blending Start Step: 8360
2024-12-01 11:27:52 [INFO] [QDO] Blending End Step: 66880
2024-12-01 11:27:52 [INFO] [QDO] Blending Start Epoch: 5
2024-12-01 11:27:52 [INFO] [QDO] Blending End Epoch: 40
2024-12-01 11:27:52 [INFO] [Sensor] SRF Type= rgb
2024-12-01 11:27:52 [WARNING] [!] The PSF resize mode is enabled.
2024-12-01 11:27:52 [WARNING] [!] The Image resizing is disabled because the `wave_resolution` and `sensor_resolution` is identical.
2024-12-01 11:27:52 [INFO] DOE physical size = 4.10e-03 m.
 Wave resolution = 512.
2024-12-01 11:27:52 [INFO] Using group tag:  000A4LevelHyperspctralTraining
2024-12-01 11:27:52 [INFO] [DIR] training log directory name =  000A4LevelHyperspctralTraining/20241201-112752-SK1300-20nmNoise-2Lv-STE-Sd50-Sc1m
2024-12-01 11:27:52 [INFO] Preparing training datasets from D:/yanyi/datas/ICVL/train...
2024-12-01 11:27:52 [INFO] Dataset loaded from:  D:/yanyi/datas/ICVL/train
2024-12-01 11:27:52 [INFO] Preparing validation datasets from D:/yanyi/datas/ICVL/validation...
2024-12-01 11:27:52 [INFO] Dataset loaded from:  D:/yanyi/datas/ICVL/validation
2024-12-01 11:27:53.118917: I tensorflow/core/profiler/lib/profiler_session.cc:101] Profiler session initializing.
2024-12-01 11:27:53.118969: I tensorflow/core/profiler/lib/profiler_session.cc:116] Profiler session started.
2024-12-01 11:27:53.119025: I tensorflow/core/profiler/backends/gpu/cupti_tracer.cc:1664] Profiler found 1 GPUs
2024-12-01 11:27:53.199258: I tensorflow/core/profiler/lib/profiler_session.cc:128] Profiler session tear down.
2024-12-01 11:27:53.200246: I tensorflow/core/profiler/backends/gpu/cupti_tracer.cc:1798] CUPTI activity buffer flushed
WARNING:tensorflow:From D:\Anaconda3\envs\tf_st\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py:1332: start (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
use `tf.profiler.experimental.start` instead.
2024-12-01 11:27:53.219975: I tensorflow/core/profiler/lib/profiler_session.cc:101] Profiler session initializing.
2024-12-01 11:27:53.220040: I tensorflow/core/profiler/lib/profiler_session.cc:116] Profiler session started.
WARNING:tensorflow:From D:\Anaconda3\envs\tf_st\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py:1383: stop (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
use `tf.profiler.experimental.stop` instead.
2024-12-01 11:27:53.277306: I tensorflow/core/profiler/lib/profiler_session.cc:67] Profiler session collecting data.
2024-12-01 11:27:53.278280: I tensorflow/core/profiler/backends/gpu/cupti_tracer.cc:1798] CUPTI activity buffer flushed
2024-12-01 11:27:53.298034: I tensorflow/core/profiler/backends/gpu/cupti_collector.cc:521]  GpuTracer has collected 1 callback api events and 1 activity events. 
2024-12-01 11:27:53.298264: I tensorflow/core/profiler/lib/profiler_session.cc:128] Profiler session tear down.
WARNING:tensorflow:From D:\Anaconda3\envs\tf_st\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py:1383: save (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
`tf.python.eager.profiler` has deprecated, use `tf.profiler` instead.
WARNING:tensorflow:From D:\Anaconda3\envs\tf_st\lib\site-packages\tensorflow\python\eager\profiler.py:150: maybe_create_event_file (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
`tf.python.eager.profiler` has deprecated, use `tf.profiler` instead.
2024-12-01 11:27:53 [INFO] Creating dir: ./checkpoint/000A4LevelHyperspctralTraining/20241201-112752-SK1300-20nmNoise-2Lv-STE-Sd50-Sc1m
2024-12-01 11:27:53 [INFO] [QDO] `quantization-aware` mode is enabled.
2024-12-01 11:27:53 [INFO] [QDO] using quantization-aware approach: <STE>.
2024-12-01 11:27:53 [INFO] [QDO] Quantization levels： 2
2024-12-01 11:27:54 [INFO] Simulated fabrication noise on height map: 2.00e-08
2024-12-01 11:27:55.010584: F tensorflow/stream_executor/cuda/cuda_driver.cc:147] Failed setting context: CUDA_ERROR_NOT_PERMITTED: operation not permitted

Kiran_Sai_Ramineni · December 2, 2024, 9:59am

Hi @Zhou_Wu, Could you please let us know how this error occurred like executing which command/code causes this error. Also let us know the environmental details you are using. Thank You.

Zhou_Wu · December 8, 2024, 2:07pm

Hi,@Kiran_Sai_Ramineni, thank you for taking the time to reply amidst your busy schedule
I encountered this error while calling the. fit method in the model.
Here are the details of my environment. If you have time, could you help me take a look. That’s truly grateful（）

(tf_st) PS D:\Study\tfst> pip list
Package                      Version
---------------------------- -----------
absl-py                      2.1.0
astunparse                   1.6.3
cachetools                   5.4.0
certifi                      2024.7.4
charset-normalizer           3.3.2
contourpy                    1.2.1
cycler                       0.12.1
einops                       0.8.0
flatbuffers                  24.3.25
fonttools                    4.53.1
gast                         0.4.0
google-auth                  2.32.0
google-auth-oauthlib         0.4.6
google-pasta                 0.2.0
grpcio                       1.65.1
h5py                         3.11.0
hdf5storage                  0.1.19
idna                         3.7
importlib_metadata           8.2.0
importlib_resources          6.4.0
joblib                       1.4.2
keras                        2.10.0
Keras-Preprocessing          1.1.2
kiwisolver                   1.4.5
libclang                     18.1.1
Markdown                     3.6
MarkupSafe                   2.1.5
matplotlib                   3.9.1
ml-dtypes                    0.2.0
numpy                        1.24.0
oauthlib                     3.2.2
opt-einsum                   3.3.0
packaging                    24.1
pandas                       2.2.3
pillow                       10.4.0
pip                          24.0
protobuf                     3.19.6
pyasn1                       0.6.0
pyasn1_modules               0.4.0
pyparsing                    3.1.2
python-dateutil              2.9.0.post0
pytz                         2024.2
PyYAML                       6.0.1
requests                     2.32.3
requests-oauthlib            2.0.0
rsa                          4.9
scikit-learn                 1.5.2
scipy                        1.13.1
setuptools                   69.5.1
six                          1.16.0
tensorboard                  2.10.1
tensorboard-data-server      0.6.1
tensorboard-plugin-wit       1.8.1
tensorflow                   2.14.0
tensorflow-estimator         2.10.0
tensorflow-gpu               2.10.0
tensorflow-intel             2.14.0
tensorflow-io-gcs-filesystem 0.31.0
termcolor                    2.4.0
threadpoolctl                3.5.0
typing_extensions            4.12.2
tzdata                       2024.2
urllib3                      2.2.2
Werkzeug                     3.0.3
wheel                        0.43.0
wrapt                        1.14.1
zipp                         3.19.2
cuda11.2,cudnn8.1

Kiran_Sai_Ramineni · December 11, 2024, 6:01am

Hi @Zhou_Wu, As per the test build configuration tf 2.14 supports cudnn 8.7 and cuda 11.8. could you please try by upgrading the cuda and cudnn versions. If the issue still persists please let us know the OS in which you are having the issue. Thank You.

Zhou_Wu · December 13, 2024, 8:48am

Hi @Kiran_Sai_Ramineni
Thank you very sincerely for your reply. I am unable to provide a timely response because I tried different possibilities again.
However, my problem remains unresolved. The environment requirement for the source code is TF2.5, so I installed CUDA11.2 and CUDNN8.1 according to the corresponding versions, but a new error occurred. May I ask if you have any good suggestions for me

Firstly, I used simpler testing code to obtain the following results
test code:

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.losses import SparseCategoricalCrossentropy


print(f"Using TensorFlow version {tf.__version__}")


(x_train, y_train), (x_test, y_test) = mnist.load_data()


x_train, x_test = x_train / 255.0, x_test / 255.0


model = Sequential([
  Flatten(input_shape=(28, 28)), 
  Dense(128, activation='relu'), 
  Dense(10) 
from_logits=True
])


model.compile(optimizer='adam',
              loss=SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])


model.fit(x_train, y_train, epochs=5)


test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print(f'\nTest accuracy: {test_acc:.4f}')

test result

(eed) D:\Study\test>python test.py
2024-12-13 16:40:34.852051: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
Using TensorFlow version 2.5.0
2024-12-13 16:40:36.912685: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2024-12-13 16:40:36.963395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 4060 Laptop GPU computeCapability: 8.9
coreClock: 1.89GHz coreCount: 24 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.45GiB/s
2024-12-13 16:40:36.963785: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2024-12-13 16:40:36.968959: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2024-12-13 16:40:36.969072: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2024-12-13 16:40:36.973892: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2024-12-13 16:40:36.975777: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2024-12-13 16:40:36.980077: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2024-12-13 16:40:36.984377: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2024-12-13 16:40:36.985119: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2024-12-13 16:40:36.985549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2024-12-13 16:40:36.985966: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-12-13 16:40:36.988191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 4060 Laptop GPU computeCapability: 8.9
coreClock: 1.89GHz coreCount: 24 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.45GiB/s
2024-12-13 16:40:36.988291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2024-12-13 16:40:37.599963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2024-12-13 16:40:37.600201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0
2024-12-13 16:40:37.600484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N
2024-12-13 16:40:37.600710: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:210] Using CUDA malloc Async allocator for GPU.
Traceback (most recent call last):
  File "test.py", line 17, in <module>
    model = Sequential([
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\training\tracking\base.py", line 522, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\keras\engine\sequential.py", line 114, in __init__
    super(functional.Functional, self).__init__(  # pylint: disable=bad-super-call
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\training\tracking\base.py", line 522, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\keras\engine\training.py", line 318, in __init__
    self._init_batch_counters()
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\training\tracking\base.py", line 522, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\keras\engine\training.py", line 326, in _init_batch_counters
    self._train_counter = variables.Variable(0, dtype='int64', aggregation=agg)
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\ops\variables.py", line 262, in __call__
    return cls._variable_v2_call(*args, **kwargs)
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\ops\variables.py", line 244, in _variable_v2_call
    return previous_getter(
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\ops\variables.py", line 237, in <lambda>
    previous_getter = lambda **kws: default_variable_creator_v2(None, **kws)
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2662, in default_variable_creator_v2
    return resource_variable_ops.ResourceVariable(
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\ops\variables.py", line 264, in __call__
    return super(VariableMetaclass, cls).__call__(*args, **kwargs)
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1584, in __init__
    self._init_from_args(
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1727, in _init_from_args
    initial_value = ops.convert_to_tensor(initial_value,
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\profiler\trace.py", line 163, in wrapped
    return func(*args, **kwargs)
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\framework\ops.py", line 1566, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\framework\tensor_conversion_registry.py", line 52, in _default_conversion_function
    return constant_op.constant(value, dtype, name=name)
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\framework\constant_op.py", line 264, in constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\framework\constant_op.py", line 276, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\framework\constant_op.py", line 301, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\framework\constant_op.py", line 97, in convert_to_eager_tensor
    ctx.ensure_initialized()
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\eager\context.py", line 525, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.InternalError: No allocator statistics

Then my source code reported the same error，as follows

(eed) D:\Study\QuantizationAwareDeepOptics-main4.0>python main.py
2024-12-13 16:50:51.154468: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2024-12-13 16:50:52.890525: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2024-12-13 16:50:52.937098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 4060 Laptop GPU computeCapability: 8.9
coreClock: 1.89GHz coreCount: 24 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.45GiB/s
2024-12-13 16:50:52.937362: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2024-12-13 16:50:52.942704: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2024-12-13 16:50:52.944154: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2024-12-13 16:50:52.949198: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2024-12-13 16:50:52.950499: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2024-12-13 16:50:52.957792: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2024-12-13 16:50:52.961846: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2024-12-13 16:50:52.963164: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2024-12-13 16:50:52.963314: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2024-12-13 16:50:52.964640: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-12-13 16:50:52.967700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 4060 Laptop GPU computeCapability: 8.9
coreClock: 1.89GHz coreCount: 24 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.45GiB/s
2024-12-13 16:50:52.968301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2024-12-13 16:50:53.471358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2024-12-13 16:50:53.471543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0
2024-12-13 16:50:53.471624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N
2024-12-13 16:50:53.471859: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:210] Using CUDA malloc Async allocator for GPU.
Traceback (most recent call last):
  File "main.py", line 78, in <module>
    logical_devices = tf.config.list_logical_devices("GPU")
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\framework\config.py", line 452, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\eager\context.py", line 1395, in list_logical_devices
    self.ensure_initialized()
  File "D:\Anaconda3\envs\eed\lib\site-packages\tensorflow\python\eager\context.py", line 525, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.InternalError: No allocator statistics

Details of the environment：

absl-py                 0.15.0
astunparse              1.6.3
cachetools              5.5.0
certifi                 2024.8.30
charset-normalizer      3.4.0
flatbuffers             1.12
gast                    0.4.0
google-auth             2.37.0
google-auth-oauthlib    0.4.6
google-pasta            0.2.0
grpcio                  1.34.1
h5py                    3.1.0
idna                    3.10
importlib_metadata      8.5.0
keras-nightly           2.5.0.dev2021032900
Keras-Preprocessing     1.1.2
Markdown                3.7
MarkupSafe              2.1.5
numpy                   1.19.5
oauthlib                3.2.2
opt-einsum              3.3.0
pip                     24.2
protobuf                3.20.3
pyasn1                  0.6.1
pyasn1_modules          0.4.1
requests                2.32.3
requests-oauthlib       2.0.0
rsa                     4.9
setuptools              75.1.0
six                     1.15.0
tensorboard             2.11.2
tensorboard-data-server 0.6.1
tensorboard-plugin-wit  1.8.1
tensorflow-estimator    2.5.0
tensorflow-gpu          2.5.0
termcolor               1.1.0
typing-extensions       3.7.4.3
urllib3                 2.2.3
Werkzeug                3.0.6
wheel                   0.44.0
wrapt                   1.12.1
zipp                    3.20.2

Kiran_Sai_Ramineni · December 18, 2024, 7:31am

Hi @Zhou_Wu, If possible could you please try to upgrade tensorflow to the lastest stable version as tf2.5 is not actively supported. Thank You.

Topic		Replies	Views
Error occurred when finalizing GeneratorDataset iterator General Discussion help_request	4	10161	November 18, 2022
Total loss is increasing to 10 digits after some steps General Discussion models , datasets	3	1094	January 18, 2023
CUDA and cudnn error while training a pix-to-pix GAN using multi-gpu General Discussion distributed-training , gpu	1	952	February 27, 2023
Not able to run my code on gpu General Discussion gpu	1	327	January 22, 2024
Unable to train custom dataset for SSDMobileNetV1 +Tensorflow 1.15 TensorFlow models , datasets , help_request	1	1270	October 4, 2023

2024-12-01 11:27:55.010584: F tensorflow/stream_executor/cuda/cuda_driver.cc:147] Failed setting context: CUDA_ERROR_NOT_PERMITTED: operation not permitted

Related topics