Different behavior of tf.keras.layers.experimental.preprocessing.HashedCrossing

Hello,

I’m using the layer above in the context of tfx. When I build a tfx pipeline and using a Transform component with the layer mentioned above included in preprocessing_fn, ipython crashes due to out of memory. When I run the same preprocessing_fn without using the Transform component and calling beam directly, I see correct behavior. This occurs when I’m using a thousand buckets. When I reduce it to a hundred buckets, the behavior is as expected with both methods. I have a few questions:

  1. Has anyone seen this before?
  2. Why does Transform execute differently through Local Dag Runner when compared with AnalyzeAndTransformDataset in the context of a beam pipeline?

Any guidance is appreciated. Thank you!

-Pritam

Hi @pritamdodeja ,

The out-of-memory issue you’re facing with the Transform component in your TFX pipeline might be due to excessive bucket size, different execution behaviors between Transform and Beam, or resource constraints.

You can check these Parameter to resolve this issue ,

  • Adjust the bucket size.
  • Optimize the preprocessing function.
  • Check Beam pipeline configuration.
  • Consider alternative approaches like TFDV.

hope this helps ,

Thank you

I think the sparseness that is happening is within tensorflow, which, from my understanding supports it. I’m putting a lower dimensional embedding layer right after. How would tfdv help here? Thanks for your response!

Pritam