opened 07:54PM - 18 Apr 23 UTC
closed 11:09AM - 28 Jun 23 UTC
type:feature
stat:awaiting response
**System information**
- TFX Version (you are using): master
- Environment i… n which you plan to use the feature (e.g., Local
(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc..): gcp / Linux
- Are you willing to contribute it (Yes/No): Yes
**Describe the feature and the current behavior/state.**
In the current state, `BigQueryExampleGen` has a few issues for non-standard use. [This pull request](https://github.com/tensorflow/tfx/pull/5858) is the first, and lower touch, of two to allow options to propagate into the `Executor`.
The greatest of these issues is when you are a tenant and therefore do not have access to BQ Dataset at the "project" administrative level. **You need to specify the table to write to. This cannot be done in `beam_pipeline_args` anymore, as far as I can tell**.
See [this issue (3754)](https://github.com/tensorflow/tfx/issues/3754) for rationale on why this was not previously changed.
None of the following work as independent solutions within the `beam_pipeline_args`:
`f'--project={PROJECT}.{TABLE}'`
`f'--project={PROJECT}:{TABLE}'`
`f'--project={PROJECT}:{TABLE}.{TEMP_NAME}'`
`f'--temp-location={os.path.join(BUCKET, 'tmp')}'`
The TRUE issue to this is the `utils.ReadFromBigQuery` which could _potentially_ take other commands, but the BEST way to initiate this is to allow the ability to pass custom pieces into the pipeline. From the `custom_config` perspective, you could later pass along things that might want to be retrieved, as in the `Parquet Executor` which is part of the `FileBasedExampleGen`.
This commit aligns the other pieces with the notation in FileBasedExampleGen.
**Will this change the current API? How?**
Most users will not notice the difference, unless those users are familiar enough with the TFX ecosystem to want to do things that are in line with existing pieces but just slightly different. However, this opens the opportunity for people to **better utilize custom Executors when needed / desired**.
**Who will benefit with this feature?**
Users in industrial settings who navigate tenancy permissions
**Do you have a workaround or are completely blocked by this?** :
This is the workaround
**Name of your Organization (Optional)**
**Any Other info.**
Our issue is similar to this we do not want to provide create and delete permission for dataset help only option is to pre specify the temp dataset , where the underlying beam support temp_dataset argument but tfx has no way to pass it thus i stumped upon custom_config bit and tried various options but it always errors. I am struggling to find right implementation to this because specifiy the temp dataset in any format in the custom config is throught error or not recognizing it