Add dataset as input to job

Instead of referring to the locally stored CSV directly from the training script, you want to specify the data input in the YAML file. There are two considerations when you take this approach:

In the script: You define the input arguments using the argparse module. You specify the argument’s name, type and optionally a default value.
In the YAML file: You specify the data input, which will mount (default option) or download data to the local file system. You can refer to a public URI or a registered dataset in the Azure Machine Learning workspace.

Script

import argparse
import pandas as pd

parser = argparse.ArgumentParser()
parser.add_argument("--data-csv", dest='data_csv', type=str)

args = parser.parse_args()

df = pd.read_csv(args.data_csv)

Yaml

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: 
  local_path: src
command: >-
  python main.py
  --data-csv $
inputs:
  dataset: azureml:customer-churn-data:1
  mode: download 
environment: azureml:basic-env-scikit:1
compute: azureml:testdev-vm
experiment_name: customer-churn
description: Train a classification model on a sample customer dataset.