Add dataset as input to job

Instead of referring to the locally stored CSV directly from the training script, you want to specify the data input in the YAML file. There are two considerations when you take this approach:

Script

import argparse
import pandas as pd

parser = argparse.ArgumentParser()
parser.add_argument("--data-csv", dest='data_csv', type=str)

args = parser.parse_args()

df = pd.read_csv(args.data_csv)

Yaml

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: 
  local_path: src
command: >-
  python main.py
  --data-csv $
inputs:
  dataset: azureml:customer-churn-data:1
  mode: download 
environment: azureml:basic-env-scikit:1
compute: azureml:testdev-vm
experiment_name: customer-churn
description: Train a classification model on a sample customer dataset.