Hello, I’ve run my TFX pipeline on VertexAI and the output from the ExampleGen
component has been saved to a location in GCS - is there a way I could load those examples in to an InteractiveContext
within a notebook so I can view the Facets output?
For example:
statistics_gen = StatisticsGen(examples="? examples processed and saved in GCS")
context.run(statistics_gen)
context.show(statistics_gen.outputs['statistics'])
Or, is there a way of loading the statistics that have already been emitted by the pipeline?
It looks like the stats were saved to locations:
StatisticsGen/statistics/40/Split-train
StatisticsGen/statistics/40/Split-eval
Could I use the metadata DB for this purpose?
So many questions Thanks!
The ExampleGen
component outputs dataset artifacts, which are TFrecord files. You can import a previously generated artifact by using the ImporterNode. Here’s an example of importing a previously generated schema, but a dataset would be similar.
Hi Robert and thanks for pointing me in the right direction!
This is the code I used to load the generated TFRecords (these were generated when I ran my pipeline locally) and ran the StatisticsGen
component over them with an InteractiveContext
:
source = os.path.join('/path to eval and train TFRecord files') # for example, it could be /your-pipeline/BigQueryExampleGen/examples/NN
importer = Importer(
source_uri=source,
artifact_type=standard_artifacts.Examples,
properties={
'span': 0,
'split_names': '["train", "eval"]',
'version': 0
}
)
context.run(importer)
print(f'importer.outputs: {importer.outputs}')
statistics_gen = StatisticsGen(examples=importer.outputs['result'])
context.run(statistics_gen)
context.show(statistics_gen.outputs['statistics'])