Hello guys,
im currently configuring a tfx pipeline with components such as SchemaGen,StatisticGen and Transform in my .ipynb:
module_file = os.path.abspath(“…\components\module.py”)
#%%
stats_gen = StatisticsGen(
examples=example_gen.outputs[‘examples’]
)
#%%
context.run(stats_gen)
#%%
schema_gen = SchemaGen(
statistics=stats_gen.outputs[‘statistics’])
#%%
context.run(schema_gen)
#%%
transform = Transform(
examples=example_gen.outputs[‘examples’],
schema=schema_gen.outputs[‘schema’],
module_file=module_file)
#%%
context.run(transform)
Providing the statistics-, and schema-gen components and running them in a tfx-workflow using context.run() works as intended.
my context looks like: context = InteractiveContext(pipeline_root=os.path.abspath(‘.\pipeline-root’))
and the pipeline-root path is for storing the metadata.
The error occurs when running context.run(transform), and the error logs are:
RuntimeError: OSError: [WinError 3] The system cannot find the path specified: ‘C:\Users\benne\PycharmProjects\mlops-energieverbrauch\interactive-pipeline\pipeline-root\Transform\updated_analyzer_cache\20\pipeline-root-CsvExampleGen-examples-1-Split-train-STAR-7bed901109c5f95d19d422aeadf97ec632a90adc04f87f51192d4f46edaf114e\beam-temp-25-1867ce6685f511ee85cbe8f408833848’ [while running ‘WriteCache/Write[AnalysisIndex0][CacheKeyIndex25]/Write/WriteImpl/InitializeWrite’]
all the imports for the .ipynb:
import os
from tfx.components import CsvExampleGen, Transform, schema_gen, StatisticsGen, SchemaGen
from keras_tuner.src.backend.io import tf
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
import pprint
import csv
import pandas as pd
from tfx.proto import example_gen_pb2
import tensorflow_data_validation as tfdv
the appropriate module.py with the preprocessing_fn() looks like:
import tensorflow_transform as tft
from tfx.examples.chicago_taxi_pipeline.taxi_utils_native_keras import _transformed_name as transformed_name
from tfx.experimental.templates.taxi.models.preprocessing import _fill_in_missing as fill_in_missing
import tensorflow as tf
def preprocessing_fn(inputs):
outputs = {}
for key in inputs.keys():
if inputs[key].dtype in [tf.float16, tf.float32, tf.float64,
tf.int8, tf.int16, tf.int32, tf.int64,
tf.uint8, tf.uint16, tf.uint32, tf.uint64]:
outputs[transformed_name(key)] = tft.scale_to_z_score(fill_in_missing(inputs[key]))
else:
outputs[transformed_name(key)] = fill_in_missing(inputs[key]) # Fill missing values for non-numeric data
return outputs
Does anyone have an idea and can maybe help?
Thanks in advance!!!