Hello,
I’m using the layer above in the context of tfx. When I build a tfx pipeline and using a Transform component with the layer mentioned above included in preprocessing_fn, ipython crashes due to out of memory. When I run the same preprocessing_fn without using the Transform component and calling beam directly, I see correct behavior. This occurs when I’m using a thousand buckets. When I reduce it to a hundred buckets, the behavior is as expected with both methods. I have a few questions:
- Has anyone seen this before?
- Why does Transform execute differently through Local Dag Runner when compared with
AnalyzeAndTransformDataset
in the context of a beam pipeline?
Any guidance is appreciated. Thank you!
-Pritam