I recently upgraded the tensorflow version used in my program to the recently released 2.6.0, but I ran into a trouble.
import tensorflow as tf
pattern = 'hdfs://mypath'
print(tf.io.gfile.glob(pattern))
The above API throws an exception in version 2.6:
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme'hdfs' not implemented (file:xxxxx)
Then I checked the relevant implementation code and found that the official recommendation is to use tensorflow/io
to access hdfs, and the environment variable TF_USE_MODULAR_FILESYSTEM
is provided to use legacy access support. Since my code is more complex and difficult to refactor in a short time, I tried to use this environment variable, but it still failed.
In general, my questions are:
- In the latest version of tensorflow, if “tfio” is not used, how can I still access the HDFS file?
- If “tfio” must be used, what is the equivalent code call to
tf.io.gfile.glob
?