GOAL
- Explain the following definitions in plane simple english?
- Many practical examples of what they can have (contain model, model hyper parameters, just and only data…)?
- What each of them do?
ORIGINAL DOC
This is the original ML Metadata | TFX | TensorFlow that I have issues with. In my description I will point back to this. “The Metadata Store uses the following data model to record and retrieve metadata from the storage backend.”:
- ArtifactType: describes an artifact’s type and its properties that are stored in the metadata store. You can register these types on-the-fly with the metadata store in code, or you can load them in the store from a serialized format. Once you register a type, its definition is available throughout the lifetime of the store.
- An Artifact: describes a specific instance of an ArtifactType, and its properties that are written to the metadata store.
- An ExecutionType: describes a type of component or step in a workflow, and its runtime parameters.
- An Execution: is a record of a component run or a step in an ML workflow and the runtime parameters. An execution can be thought of as an instance of an ExecutionType. Executions are recorded when you run an ML pipeline or step.
- An Event: is a record of the relationship between artifacts and executions. When an execution happens, events record every artifact that was used by the execution, and every artifact that was produced. These records allow for lineage tracking throughout a workflow. By looking at all events, MLMD knows what executions happened and what artifacts were created as a result. MLMD can then recurse back from any artifact to all of its upstream inputs.
- A ContextType: describes a type of conceptual group of artifacts and executions in a workflow, and its structural properties. For example: projects, pipeline runs, experiments, owners etc.
- A Context: is an instance of a ContextType. It captures the shared information within the group. For example: project name, changelist commit id, experiment annotations etc. It has a user-defined unique name within its ContextType.
- An Attribution: is a record of the relationship between artifacts and contexts.
- An Association: is a record of the relationship between executions and contexts.
How I see them
(Please help me by correct my descriptions or let me know if they are right. The statements are how I perceive the descriptions and the questions are that I don’t understand and need to be answered)
- Def ArtifactType (how I can draw down based on the documentations):
- Contains the base data
- Contains many iteration and multiple modified version of the base data
- Defines the data types
- Have properties
- Sores data as metadasta in metadata storage ex.: database, in ram.
- What is an ArtifactType?
- What are all the possible things that it can store?
- What are all the properties it has?
- 1 Artifact (how I can draw down based on the documentations):
- 1 version of the modified data
- 1 version of the modified data’s properties
- 1 version of the modified data’s data types
- 1 version of a specific instance of an ArtifactType, and its properties that are written to the metadata store.
- !! BUT THAN “List all Artifacts of a specific type. Example: all Models that have been trained.” → So saved down model can also be Artifacts. This documentation is just terrible their ArtifactType and Artifact is pointing on each other whiteout explain any of them what it is. It doesn’t makes any sense.
- What is an Artifact?
- What are all the possible things that it can store?
- What are all the properties it has?
- ExecutionType:
- What is a “component in a workflow”?
- What is a “step in a workflow, and its runtime parameters.”
- Because this TEXT area has NO workflow chart to point at while there is one to the previous section tfx/guide/mlmd#metadata_store and there is one after tfx/guide/mlmd#integrate_ml_metadata_into_your_ml_workflows .
- Execution:
- What is a record here?
- What is a component?
- What is a component run?
- What runtime parameters are we talking about?
- What is Execution overall?
- 1 version of a specific instance of ExecutionType.
- Executions save to metadata storage (ex.: RAM or database) you run an ML pipeline or step.
- Event:
- Is a record of the relationship between artifacts and executions.
- to me it is not clear why is this step even necessary because event and execution sounds like they fulfill the same exact purpose.
- to me it seems like execution saves down itself than why do we need an event to save it down again?
- This is the only understandable statement in this definition “By looking at all events, MLMD knows what executions happened and what artifacts were created as a result.”
- ContextType:
- What is “conceptual group of artifacts and executions” ? Especially what is “conceptual” about them?
- perfect examples this is what all the other description should be.
- Context:
- 1 version of the ContextType.
- Again what is this “conceptual group of artifacts and executions” ? Especially what is “conceptual” about them?
- Again GREAT examples.
- Attribution:
- simple and understandable description
- If all the elements ave and describes them self why is this necessary?
- Association:
- simple and understandable description
- If all the elements ave and describes them self why is this necessary?
Previous recommendations
- Just not helpful also the colab also not answers the basic definitions - /tfx/tutorials/mlmd/mlmd_tutorial