Instead of referring to the locally stored CSV directly from the training script, you want to specify the data input in the YAML file. There are two considerations when you take this approach:
import argparse
import pandas as pd
parser = argparse.ArgumentParser()
parser.add_argument("--data-csv", dest='data_csv', type=str)
args = parser.parse_args()
df = pd.read_csv(args.data_csv)
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code:
local_path: src
command: >-
python main.py
--data-csv $
inputs:
dataset: azureml:customer-churn-data:1
mode: download
environment: azureml:basic-env-scikit:1
compute: azureml:testdev-vm
experiment_name: customer-churn
description: Train a classification model on a sample customer dataset.