defines the ml pipeline: describes how to break a full ml task into a multistep workflow. ach step is a component that has well defined interface and can be developed, tested, and optimized independently. The pipeline YAML also defines how the child steps connect to other steps in the pipeline, for example the model training step generate a model file and the model file will pass to a model evaluation step.
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline # Required and must be pipeline
# Display name of the pipeline job in Studio UI. Editable in Studio UI. Doesn't have to be unique across all jobs in the workspace.
display_name: 3b_pipeline_with_data
description: Pipeline with 3 component jobs with data dependencies
settings:
default_compute: azureml:cpu-cluster
outputs:
final_pipeline_output:
mode: rw_mount
# Required. Dictionary of the set of individual jobs to run as steps within the pipeline. These jobs are considered child jobs of the parent pipeline job. In this release, supported job types in pipeline are command and sweep
jobs:
component_a:
type: command
component: ./componentA.yml
inputs:
component_a_input:
type: uri_folder
path: ./data
outputs:
component_a_output:
mode: rw_mount
component_b:
type: command
component: ./componentB.yml
inputs:
component_b_input: $
outputs:
component_b_output:
mode: rw_mount
component_c:
type: command
component: ./componentC.yml
inputs:
component_c_input: $
outputs:
component_c_output: $
# mode: upload
defines the component and packages following information:
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command
name: component_a
display_name: componentA
version: 1
inputs:
component_a_input:
type: uri_folder
test_integeri_nimel_meri:
type: number
min: 0
max: 7
default: 6
outputs:
component_a_output:
type: uri_folder
code: ./componentA_src
environment:
image: python
command: >-
python hello.py --componentA_input $ --componentA_output $
either a literal value or an object containing input schema.
Object input (of type uri_file, uri_folder,mltable,mlflow_model,custom_model) can connect to other steps in the parent pipeline job and hence pass data/model to other steps. In pipeline graph, the object type input will render as a connection dot.
Literal value inputs (string,number,integer,boolean) are the parameters you can pass to the component at run time. You can add default value of literal inputs under default field. For number and integer type, you can also add minimum and maximum value of the accepted value using min and max fields. If the input value exceeds the min and max, pipeline will fail at validation. Validation happens before you submit a pipeline job to save your time. Validation works for CLI, Python SDK and designer UI. Below screenshot shows a validation example in designer UI. Similarly, you can define allowed values in enum field.
If you want to add an input to a component, remember to edit three places: