Different behavior of tf.keras.layers.experimental.preprocessing.HashedCrossing

pritamdodeja · January 23, 2023, 12:06am

Hello,

I’m using the layer above in the context of tfx. When I build a tfx pipeline and using a Transform component with the layer mentioned above included in preprocessing_fn, ipython crashes due to out of memory. When I run the same preprocessing_fn without using the Transform component and calling beam directly, I see correct behavior. This occurs when I’m using a thousand buckets. When I reduce it to a hundred buckets, the behavior is as expected with both methods. I have a few questions:

Has anyone seen this before?
Why does Transform execute differently through Local Dag Runner when compared with AnalyzeAndTransformDataset in the context of a beam pipeline?

Any guidance is appreciated. Thank you!

-Pritam

Aniket_Dubey · October 18, 2024, 5:22pm

Hi @pritamdodeja ,

The out-of-memory issue you’re facing with the Transform component in your TFX pipeline might be due to excessive bucket size, different execution behaviors between Transform and Beam, or resource constraints.

You can check these Parameter to resolve this issue ,

Adjust the bucket size.
Optimize the preprocessing function.
Check Beam pipeline configuration.
Consider alternative approaches like TFDV.

hope this helps ,

Thank you

pritamdodeja · October 18, 2024, 8:59pm

I think the sparseness that is happening is within tensorflow, which, from my understanding supports it. I’m putting a lower dimensional embedding layer right after. How would tfdv help here? Thanks for your response!

Pritam

Topic		Replies	Views
How to handle OOM at TFX Transform component? TFX-Addons transform	1	343	August 21, 2024
TFX Issue: RuntimeError: Failed to apply: CreateSavedModel[tf_v2_only] TFX-Addons models , tfx	4	978	August 16, 2023
SparseTensor memory leak General Discussion tensorflow	0	301	August 1, 2023
Dataset memory footprint keeps growing General Discussion api , keras , tfdata	5	1340	September 25, 2023
Padding dataset in tfx pipeline General Discussion tfx , datasets , help_request	1	1643	April 27, 2022

Different behavior of tf.keras.layers.experimental.preprocessing.HashedCrossing

Related topics