Hi, I am using a data set which has many nan/missing values. But statistics gen is not able to detect these and in the missing values
in all the columns it says " 0%"
Hi @Aditya_Soni , Maybe you can provide a bit of context?
E.g. did you follow Tensorflow documentation e.g. Data validation using TFX Pipeline and TensorFlow Data Validation ?
Do you get a message arror, aside from the " 0%" you are getting?
Also, maybe you can share Colab / code?
Thank you.
@tagoma
Thanks for the response, Yes I did followed the documentation.
Aside from “0%”, i was not getting any errror. Infact ExampleValidator
component was also showing No anomality’s found.
Unfortunately due to company policy, I cannot share the code .
Could this be a case where a numerical feature is being mistaken for a text feature? What do the NaN and missing values look like?
I wouldn’t expect NaNs to be reported as missing. The num_nan statistic should report those. See metadata/tensorflow_metadata/proto/v0/statistics.proto at a85e542f292562284f4d2aaa3a93c4d74060b05e · tensorflow/metadata · GitHub. If the user wants to get anomalies for NaNs, they will need to set disallow_nans in their schema. See metadata/tensorflow_metadata/proto/v0/schema.proto at a85e542f292562284f4d2aaa3a93c4d74060b05e · tensorflow/metadata · GitHub.