A very big advantage of Vertex AI pipelines is an ordered model training process. Not only are all trained models stored in the Vertex AI registry, they are also linked to all other artifacts involved in the pipeline's execution. This means that for each model version, I can determine which version of the data set was used for training, which hyper-parameters were used by definition, and which metrics were generated during validation. This is essential for retaining an overview given a variety of models and the need for regular training.
For implementation of this model versioning, Google uses the ML Metadata open-source project forming part of Tensor Flow Extended. A database abstracted from Vertex AI stores all artifacts and allows them to be linked and queried during joint pipeline execution.
Data can be queried from a browser or Python, for example.
One way of viewing from a browser is the lineage graph. The following sample diagram representing the pipeline from the last article shows that three different models have been trained on the basis of the same data set. Their inputs and metrics are also displayed interactively.
Alternatively, you can compare several runs directly in the browser and thus determine, for example, how a change in hyper-parameters has impacted a certain metric.
Here is a fine tutorial for those who want to learn more about metadata in pipelines.