Error when running model in tf2

luoyang102605 · July 5, 2023, 9:46am

Hello, I’m using tf2 to train model and get following results:

174 tensorflow.python.framework.errors_impl.InvalidArgumentError: Function invoked by the following node is not compilable: {{node __inference_train_s tep_3739}} = __inference_train_step_3739[_XlaMustCompile=true, config_proto=“\n\007\n\0…02\001\000”, executor_type=“”](dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, …).
175 Uncompilable nodes:
176 deep_fm/boolean_mask/Where: unsupported op: No registered ‘Where’ OpKernel for XLA_GPU_JIT devices compatible with node {{node deep_fm/boolean_mas k/Where}}
177 Stacktrace:
178 Node: __inference_train_step_3739, function:
179 Node: deep_fm/boolean_mask/Where, function: __inference_train_step_3739
180
181 Adam/Adam/update/Unique: unsupported op: No registered ‘Unique’ OpKernel for XLA_GPU_JIT devices compatible with node {{node Adam/Adam/update/Uniq ue}}
182 Stacktrace:
183 Node: __inference_train_step_3739, function:
184 Node: Adam/Adam/update/Unique, function: __inference_train_step_3739
185
186 Adam/Adam/update_1/Unique: unsupported op: No registered ‘Unique’ OpKernel for XLA_GPU_JIT devices compatible with node {{node Adam/Adam/update_1/ Unique}}
187 Stacktrace:
188 Node: __inference_train_step_3739, function:
189 Node: Adam/Adam/update_1/Unique, function: __inference_train_step_3739
190 [Op:__inference_train_step_3739]

All dynamic library is successfully opened, how can I solve this?
I’m using following version:
tensorflow 2.4.0
tensorflow-estimator 2.4.0
tensorflow-io 0.32.0
tensorflow-io-gcs-filesystem 0.32.0

chunduriv · July 5, 2023, 11:01am

@luoyang102605,

Welcome to the Tensorflow Forum,

Could you please train the model with the latest version of Tensorflow 2.12? If you still have any issues, please share the code to debug your issue further.

Thank you!

luoyang102605 · July 7, 2023, 3:07am

@chunduriv
I upgrade tensorflow to 2.11.0, and get new error log as follow:

 99 2023-07-07 11:00:32.638716: W tensorflow/compiler/tf2xla/kernels/random_ops.cc:57] Warning: Using tf.random.uniform with XLA compilation will ign    ore seeds; consider using tf.random.stateless_uniform instead if reproducible behavior is desired. deep_fm/dropout_1/dropout/random_uniform/Rando    mUniform
100 2023-07-07 11:01:07.266821: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INVALID_ARGUMENT: Cannot concat    enate arrays that differ in dimensions other than the one being concatenated. Dimension 0 in both shapes must be equal: f32[<=747,732,4] vs f32[5    12,1,4].
101          [[{{node deep_fm/concat}}]]
102 Traceback (most recent call last):
103   File "ModelInterface.py", line 481, in <module>
104     solver.train(train_ds, test_ds)
105   File "ModelInterface.py", line 169, in train
106     loss = self.train_step(label, fea_ids, fea_vals, model)
107   File "/nfs/volume-100058-3/nlp/xydu/my_envs/tf2/lib/python3.7/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_hand    ler
108     raise e.with_traceback(filtered_tb) from None
109   File "/nfs/volume-100058-3/nlp/xydu/my_envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
110     inputs, attrs, num_outputs)
111 tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot concatenate arrays that differ in dimensions other than the one being concat    enated. Dimension 0 in both shapes must be equal: f32[<=747,732,4] vs f32[512,1,4].
112          [[{{node deep_fm/concat}}]] [Op:__inference_train_step_4626]

How can I solve this? Much appreciate.

chunduriv · July 7, 2023, 6:19am

@luoyang102605,

To fix this issue, you need to ensure that the arrays you are trying to concatenate have same dimensions.

Thank you!

luoyang102605 · July 7, 2023, 6:20am

@chunduriv
Hi, the two tensor in concatenate opertation is with shape:

(None, 732, 4)
(512, 1, 4)

I can print all result even after this concate, but still get error like above. But this code can work well in tensorflow 1.15.0, is this caused by difference between tf1 and tf2? If so, how can I solve it?

luoyang102605 · July 7, 2023, 9:30am

@chunduriv
Hi, I may find the problem, here is the code:

single_mask = tf.where(feat_index > 0, True, False)
# before_multihot_single_mask=single_mask
for fea in self.multihot_fea:
    print(fea[0], fea[1])
    single_mask = single_mask & (tf.where(feat_index<fea[0], True, False) | tf.where(feat_index>=fea[1], True, False))

I’m tring to get a mask like above, but the for loop seems doesn’t work at all. I get an all True single_mask and this caused error. How can I solve this?

Topic		Replies	Views
Video classification with a 3D convolutional neural network Tutorial Error General Discussion docs , help_request	9	1869	May 31, 2023
How to Implement Deconvolution (10,10,128) to (320,320,3) General Discussion models , keras	3	405	October 14, 2023
How do I implement a dual task classification experiment using keras and tensor flow General Discussion keras , tensorflow	2	209	November 14, 2024
Dataset.maps causes model.fit to fail in 2.10 but the same works fine 2.9.x General Discussion datasets , gpu	2	806	October 18, 2022
Unicode decode error when training the model TensorFlow models	2	391	January 5, 2024

Error when running model in tf2

Related topics