Reading the packt book about MLflow

Adding MLflow to code

import mlflow
from sklearn.linear_model import LogisticRegression
mlflow.sklearn.autolog()
with mlflow.start_run():
    clf = LogisticRegression()
    clf.fit(X_train, y_train)

the mlflow.sklearn.autolog() instruction enables you to automatically log the experiment in the local directory.

Exploring MLflow projects

An MLflow project represents the basic unit of organization of ML projects. There are three different environments supported by MLflow projects: the Conda environment, Docker, and the local system.

MLflow in docker container

name: syspred
docker_env:
  image: stockpred-docker
entry_points:
  main:
    command: "python mljob.py"

MLflow Tracking

The MLflow tracking component is responsible for observability. The main features of this module are the logging of metrics, artifacts, and parameters of an MLflow execution. It provides vizualisations and artifact management features.

In a production setting, it is used as a centralized tracking server implemented in Python that can be shared by a group of ML practitioners in an organization. This enables improvements in ML models to be shared within the organization.

MLflow Models

MLflow Models is the core component that handles the different model flavors that are supported in MLflow and intermediates the deployment into different execution environments.

Internally, MLflow sklearn models are persisted with the conda files with their dependencies at the moment of being run and a pickled model as logged by the source code:

artifact_path: model_random_forest
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.7.6
  sklearn:
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.23.2
run_id: 22c91480dc2641b88131c50209073113
utc_time_created: '2020-10-15 20:16:26.619071'
~

The MLflow Models component allows the creation of custom-made Python modules that will have the same benefits as the built-in models, as long as a prediction interface is followed.

MLflow Registry

The model registry component in MLflow gives the ML developer an abstraction for model life cycle management. It is a centralized store for an organization or function that allows models in the organization to be shared, created, and archived collaboratively.

Chapter 2

1. Implement the heuristic model in MLflow

In the following block of code, we create the RandomPredictor class, a bespoke predictor that descends from the mlflow.pyfunc.PythonModel class. The main predict method returns a random number between 0 and 1:

import mlflow
class RandomPredictor(mlflow.pyfunc.PythonModel):
  def __init__(self):
    pass
  def predict(self, context, model_input):
    return model_input.apply(lambda column: random.randint(0,1))

2. Save the model in MLflow

The following block of code saves the model with the name random_model in a way that can be retrieved later on. It registers within the MLflow registry in the local filesystem:

model_path = "random_model"
baseline_model = RandomPredictor()
mlflow.pyfunc.save_model(path=model_path, python_model=random_model)

3. run your mlflow job

mlflow run .

4. Start serving api

mlflow models serve -m ./mlruns/0/b9ee36e80a934cef9cac3a0513db515c/artifacts/random_model/

Chapter 3

=======

Data science workbench setup

Rest of the chapter is just examples running different ml models on simple data.

Chapter 5 Managing Models

There are two main components to manage models:

An MLflow model is at its core a packaging format for models. The main goal of MLflow model packaging is to decouple the model type from the environment that executes the model. A good analogy of an MLflow model is that it’s a bit like a Dockerfile for a model, where you describe metadata of the model, and deployment tools upstream are able to interact with the model based on the specification.

Models are defined with a mlflow models file:

Model signatures and schemas

An important feature of MLflow is to provide an abstraction for input and output schemas of models and the ability to validate model data during prediction and training.

MLflow throws an error if your input does not match the schema and signature of the model during prediction.

The MLmodel file contains the signature in JSON of input and output files. For some of the flavors autologged, we will not be able to infer the signature automatically so you can provide the signature inline when logging the model:

from mlflow.models.signature import infer_signature
with mlflow.start_run(run_name=untuned_random_forest):
    …
    signature = infer_signature(X_train, 
        wrappedModel.predict(None, X_train))
    mlflow.pyfunc.log_model(random_forest_model, 
                            python_model=wrappedModel, 
                            signature=signature)

Rest of the chapter describes the Model registry. Nothing too new there for me.

Chapter 6: ML System Architecture

Chapter 7 The example platform