Using Azure ML Studio Pipeline to create endpoints to make predictions

Subhad365

17 User Group Leader

Like(0)

Report

Hi Friends, Azure ML studio is an amazing platform to working with data, select proper algorithm, train, test and deploy and predict – by using absolutely convenient way of implementing the same. All you need to do is to

Define your data
Define the pipeline
Inferencing the experiment as a real time endpoint
Creating the endpoint
Consuming the endpoint

All these points could be achieved by both

ML studio’s prebuilt components of drag and drop connecters
Your own defined algorithm to implement the same

Saying this, let me jump to some basic definitions of AI and Machine languages:

Introducing Azure ML pipeline

Azure Pipelines enables developers to automate a wide variety of tasks, ranging from executing a batch file to setting up a complete continuous integration (CI) and continuous delivery (CD) solution for their applications, along with that you can also stack up large data to understand, interpret and predict/classify/extrapolate an outcome. In our case, we are gonna use the Azure ML pipeline to process a large data and to predict the outcome from the same, through a number of stages.

Different types of Computations

Once you have defined a pipeline, you can get it executed against several options like Compute cluster, compute instance or Kubernetes cluster or even a serverless instances, depending on your need.
Depends on Speed, data volume and inferences needed, you need to make the plan as to which component should you consider.

Execution steps of a model training

A data training program has several steps:

Data input: Here we are staging up our designer pipeline with the large data set, either from a file or from a data URL or from Azure SQL or Azure container. This is called as 'Data Asset'.
Data cleansing: We can process null values, missing values, repetitive values -- and anything else we want to preprocess our data with.
Choosing the right algorithm: here you have to understand how should your model behave: is it classification or a regression algo and accordingly you can choose from a host of relevant algos available.
Once we choose the necessary model to train our program with the given data, we have to split up the data into two sects: one for training another for testing
Score the data and evaluate the prediction.

The whole process is iterative, where you may need to adjust between different algos, choose between different split percentages of train-test data: ex 70-30, 80-20, 50-50: until you get more and more precise answers.
Let us now jump to Azure Data ML Studio to create ML pipeline endpoints, step-by-step:

Step 1: create a Azure ML Workspace:

Go to https://portal.Azure.com and create a new resource >> select Azure Machine Learning >> Choose Create a new Workspace:

And fill up the following form that comes:

Select your suitable resource group, give a proper name and a region. All else, you can leave as it is, as they come pre-populated.
Click Review + Create >> Create to complete the wizard.
It will take a while to create the workspace. The following resource will be created:

Click on Launch studio to continue.

Step 2: Create data

Next, will create datasets to experiment with. Click on assets >> data >> New Data

Will can create data from a number of sources like:

Publicly available datasets from URLs
Azure SQL
Azure AI
Azure storage
From Azure open datasets (These data sets are created by the general public and published as Azure Open Datasets)

In our example, we will create dataset from Diabetes data, which is publicly available from:
https://raw.githubusercontent.com/plotly/datasets/refs/heads/master/diabetes.csv
Give a name, select model type as: tabular –

Click on next and select From Web URL:

Click on next to continue.

Here you need to give the URL which you would like to create the data from. Click on Next to continue.

Click on Next to continue.
In the next screen, you would be shown the list of the columns that you need to include to the build data from:

You would be shown the list of columns which you would like to include/exclude in your analysis. You can check/uncheck the columns you want to keep/remove. And then click on Next:

This will give you the gist of your selections. Click on Create to finish/complete the data creation.
We have Diabetes dataset ready:

Step 3: Creating the compute

Click on Compute >> New:

For our demo purpose, we would start with a very nominal infra compute, which will be just sufficient to execute our data pipeline computation:

Here we are keeping machine type as CPU, select: Select from all options and select the below option. Click Next.
Select the time out details from the next screen.

See how the auto shutdown and idle timeout are set. I am keeping this to 20 mins. This is important so as to reduce unnecessary costs.
Next is: go ahead with Review + Create to complete the creation. It will take a while to spin up the compute and it will appear in the Compute list:

Step 4: creating the pipeline

Click on Pipeline >> New pipeline, and the following screen will appear:

Here you have a number of option/algo to choose from: classification, Image classification, deep learning recommendation: what not. For our demo purpose, we will be clicking on new pipeline option with prebuilt models. The following screen will appear:

Click on Data tab >> select the data which you just created:

Click on Use data to make it appear on the canvas.

Next steps would be to
Assign a proper model to do the training: in our case it’s more of a classification algorithm, meaning if someone is having Diabetes or not (yes or no)
Split data: split the data between test and training percentage
Train model from the above step
Score model to find out how much the correctness is there with the data.
Evaluate model

All these steps are available as components and you can keep them adding as shown below:

Just notice how outcome from one step is added to the next step.
Ensure the Auto-save is turned on:

Okay: the above set of components are just examples, you can add loads of steps in between: remove missing data, remove duplicate data, execute your custom code (Python scripts, etc.) – it’s absolutely scalable/customisable – depending on all your need.
Click on Configure + submit. The following screen appears, under Runtime settings:

Select Compute type as Compute instance and then select the compute instance which you earlier created.

Select Create new >> give a name. Click on Review + Submit.
And then the pipeline starts executing. You need to wait till the execution gets over and the pipeline shows as green:

Yeah, its done. You could see/review the performance of individual nodes. Right click on Score Model >> preview data >> Score dataset >> and you could see how your algorithm has behaved:

And the data are shown in graphs:

You could right click on each and every node and preview its performance, before coming to a conclusion. Else you need to change the algorithm and resubmit the pipeline again.

Step 5: create inference pipeline:

Click on the following link to convert the pipeline into a inference pipeline, so that you can enable the same as a webservice endpoint:

This will enable the below pipeline structure:

You need just to drag and Web Service Input. Save the pipeline >> Review + create. Here to like previously, you need to select new experiment but in Runtime settings >> select ML Kubernetes compute (this will make it faster):

You can create a Kubernetes cluster like this:
Go to Compute >> Kubernetes clusters >> Create new AKS Compute:

Select the location as the same location as that of your Azure ML studio workspace, for most efficient result:

Here you select once again the most cheap, configuration – yet just so much of efficient to take care of pipeline runs.
Click on next:

Give a name, select the number of nodes = 1 (keep it for cost management). Select Cluster purpose >> Dev-test. And then click on Create.
Let us get back to our pipeline inference and you would see your AKS Kubernetes cluster would appear in the dropdown. Click on Review + Create to start the deployment.
It will take almost 20-25 minutes to finish the endpoint deployment. It will show under the Endpoint:

Clicking on this will show you the status as Healthy:

Testing

How can we test this?
If we click on Test tab, it gives an error like this:

Which means presently it doesn’t support the scripts or examples for you test this directly. How could you test this then?
reach for the JSON schema from Swagger URI mentioned at the bottom of your Details page:

There you would get the structure of your JSON input easily:

Click on 'Pretty-print' to get better impression of the structure, and then find the JSON example as shown above.
Rest part is very simple: copy the above example and try to post it from Postman, with Key values obtained from Endpoint details page.

The token value for authorization is given as under:

You would also get the URL from there itself (REST endpoint).
The JSON payload I am passing looks like this (copied from Swagger):
{
"Inputs": {
    "input1": [
      {
        "Pregnancies": 6,
        "Glucose": 148,
        "BloodPressure": 72,
        "SkinThickness": 35,
        "Insulin": 0,
        "BMI": 33.6,
        "Age": 50,
        "Outcome": 1
      }
    ]
},
"GlobalParameters": {}
}
And the outcome that I got from Postman is:

Which is a possibility of the outcome (49% chance of having diabetes, with the given conditions supplied).

With that I am taking your leave – wishing you a Merry Christmas and I would be back soon with more cool hacks and insights on Azure AI/ML studio, prediction modelling. Much love and namaste 😊

Comments

*This post is locked for comments

Community site session details