Presentation study notes

source :https://www.youtube.com/watch?v=F2b-N2Fgs9U

e2d-ml-workflow

Register data from an external url

The data you use for training is usually in one of the locations below:

Local machine
Web
Big Data Storage services (for example, Azure Blob, Azure Data Lake Storage, SQL)

Azure ML uses a Data object to register a reusable definition of data, and consume data within a pipeline. In the section below, you’ll consume some data from web url as one example. Data assets ets from other sources can be created as well.

from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

web_path = "https://archive.ics.uci.edu/ml/machine-learning-databases/00350/default%20of%20credit%20card%20clients.xls"

credit_data = Data(
    name="creditcard_defaults",
    path=web_path,
    type=AssetTypes.URI_FILE,
    description="Dataset for credit card defaults",
    tags={"source_type": "web", "source": "UCI ML Repo"},
    version="1.0.0",
)

This code just created a Data asset, it is ready to be consumed as an input by the pipeline that you’ll define in the next sections. In addition, you can register the data to your workspace so it becomes reusable across pipelines.

Registering the data asset will enable you to:

reuse and share the data asset in future pipelines
use versions to track the modification to the data asset
use the data asset from Azure ML designer, which is Azure ML’s GUI for pipeline authoring

Since this is the first time that you’re making a call to the workspace, you may be asked to authenticate. Once the authentication is complete, you’ll then see the dataset registration completion message.

In the future, you can fetch the same dataset from the workspace using credit_dataset = ml_client.data.get(“", version='').

Creating computes looks like v1

from azure.ai.ml.entities import AmlCompute

cpu_compute_target = "cpu-cluster"

try:
    # let's see if the compute target already exists
    cpu_cluster = ml_client.compute.get(cpu_compute_target)
    print(
        f"You already have a cluster named {cpu_compute_target}, we'll reuse it as is."
    )

except Exception:
    print("Creating a new cpu compute target...")

    # Let's create the Azure ML compute object with the intended parameters
    cpu_cluster = AmlCompute(
        # Name assigned to the compute cluster
        name="cpu-cluster",
        # Azure ML Compute is the on-demand VM service
        type="amlcompute",
        # VM Family
        size="STANDARD_DS3_V2",
        # Minimum running nodes when there is no job running
        min_instances=0,
        # Nodes in cluster
        max_instances=4,
        # How many seconds will the node running after the job termination
        idle_time_before_scale_down=180,
        # Dedicated or LowPriority. The latter is cheaper but there is a chance of job termination
        tier="Dedicated",
    )

    # Now, we pass the object to MLClient's create_or_update method
    cpu_cluster = ml_client.begin_create_or_update(cpu_cluster)

print(
    f"AMLCompute with name {cpu_cluster.name} is created, the compute size is {cpu_cluster.size}"
)