Machine Learning is the latest hot topic and for many good reasons. Being able to take a manual, sometimes inaccurate data point and transforming it into an automated and consistently accurate one, has huge advantages. Take for example the probability that an opportunity will close. It is up to your salesperson to input this data and you relying on it to be accurate.  With machine learning, we can automatically provide this data point and have it be consistently accurate based on historical data.

In this two-part series, I will be walking through just that, how to implement Azure Machine Learning (AML) that will provide the probability that an opportunity will win. I will be using a very simple algorithm with only a couple factors as an example. It will be up to you to come up with the factors that matter for your business and make sure your CRM has that data, in order to come up with an accurate opportunity probability.

I will be using D365 Online and the first thing we will want to do is setup the Data Export Service so we can replicate our data into an Azure SQL database which AML can easily connect with. If you haven’t set this up yet, head here for a complete guide on how to set it up. For this walkthrough, you will only need to sync the Opportunity entity but if you want to expand your factors further, you will most likely want to sync other entities.

Note: If you are on an on-premise version, you can utilize SSIS to replicate on-premise CRM database to an Azure SQL database.

Next, go to your Azure Portal, create a new resource and search for “Machine Learning Studio Workspace”

image

Once everything is setup, you can navigate to the new Machine Learning Studio workspace and click “Launch Machine Learning Studio”.  This will take you to a separate website where you will need to sign in using your Azure credentials.

image

Then create a new blank experiment:

image

You will then see a blank canvas where you can drag and drop different components. The first thing we will want to do is search for “Import Data” on the top left and drag the “Import Data” component onto the canvas.

image

Then, highlight the Import Data component and click the “Launch import Data Wizard” button on the right. Select Azure SQL Database and enter the right settings for your environment. Next, enter this SQL query:

select
    cast(estimatedvalue as float) as estimatedvalue,
    description,
    CASE 
       WHEN statecode = 2 THEN 0
    END  as statecode
from opportunity
where statecode = 1 or statecode = 2

In this query, we are retrieving the estimated value and description of the opportunity. With my sample data, the description is acting more like a “type” of opportunity. The values in this field in my org are either “Beer” or “Wine”. Estimated value and description are the two factors that we are using to determine if the Opportunity is Won or Lost. In a production scenario, you will most likely want to capture and use more data points for an accurate opportunity probability. Factors such as how long the opportunity sits in each stage, how many activities are regarding the opportunity, or maybe even the salesperson that owns the opportunity.

The first thing to note with the query is that AML currently doesn’t support decimals so we need to cast estimatedvalue as a float.

cast(estimatedvalue as float) as estimatedvalue

Next, we’re going to be using the Two-Class Boosted Decision Tree algorithm which requires us to pass it a 0 for negative and 1 for positive. We want to label the Lost opportunities as negative and the Won opportunities as positive so we use a SQL CASE statement to set the statecode value for Lost (2) as 0. The statecode value for Won is already 1 so we are good there.

CASE 
       WHEN statecode = 2 THEN 0 
END  as statecode

Lastly, we only want to return Won and Lost opportunities as we are building the AML model based on what is already Won or Lost since that is what we are trying to predict.

where statecode = 1 or statecode = 2

Next, search for the “Split Data” component and drag it onto the canvas. Then, connect the Import Data and Split Data components together.

image

Best practice with machine learning is to split the data so that most of the data is used to build the model and the other piece of data is used to evaluate the model to see how accurate it is. That is what the Split Data component is for. You can play around with the settings to split the data how you like.

Next, drag the Two-Class Boosted Decision Tree component to the left of the Split Data component.  Then, drag the Train Model component under the Two-Class Boosted and Split Data components and connect the Two-Class Boosted to the left point of the Train Model and the Split Data to the right point of the Train Model.

image

Highlight the Train Model component and on the right-side and click the “Launch column selector” button. Next, enter “statecode” and click the checkmark.

image

This is taking a subset of the data (based on the Split Data component) and applying the Two-Class Boosted Decision Tree algorithm to it using the statecode parameter to build a model based on the Won and Lost opportunities.

Lastly, drag the Score Model and Evaluate Model components onto the canvas and make the connections like so:

image

This will run the other piece of data from the Split Data component and run it against the trained model and determine the score, which in our case is the opportunity probability. Then it will take that score and evaluate it to provide metrics on how well the model is.

Now click the Run button at the bottom which will process each component. Once it is finished, you can right-click Score Model and select Scored datasets –> Visualize. This will show you your data with the score columns applied. Scored Probabilities is the percent that the machine believes it is the positive value, which in our case is a “Won” opportunity. The Scored Labels column is the value of 0 or 1 for Lost or Won opportunity. If Scored Probabilities is greater than 50% then Scored Labels will be set to 1, otherwise it will be 0. More information on the Score Model can be found here.

image

Now right-click Evaluate Model and select Evaluation results –> Visualize. This will display a chart of how accurate your model is with data points on the accuracy. Based on your results, you can tweak the settings in the other components or bring in and remove factors and re-evaluate the model to get an accuracy that you are comfortable with. More information on the Evaluate Model and what the different data points mean can be found here.

image

That concludes the first part of our series. Now that we have a trained model with a high accuracy, in the next part of the series, I will show you how to generate a web service that we will use to retrieve the probability prediction and update D365 with the data. The best part is, we can use the power of Microsoft Flow to do all this without any code…stay tuned!