Breaking news from around the world
Get the Bing + MSN extension
Choose your path Increase your proficiency with the Dynamics 365 applications that you already use and learn more about the apps that interest you. Up your game with a learning path tailored to today's Dynamics 365 masterminds and designed to prepare you for industry-recognized Microsoft certifications.
Visit Microsoft Learn
2019 release wave 2 Discover the latest updates and new features to Dynamics 365 planned through March 2020
Release overview guides and videos Release Plan | View virtual launch event
Ace your Dynamics 365 deployment with packaged services delivered by expert consultants. | Explore service offerings
Connect with the ISV success team on the latest roadmap, developer tool for AppSource certification, and ISV community engagements | ISV self-service portal
The FastTrack program is designed to help you accelerate your Dynamics 365 deployment with confidence.
FastTrack Program | Finance TechTalks | Customer Engagement TechTalks | Talent TechTalks | Upcoming TechTalks
Unsupervised learning is a type of machine learning algorithm that is used for discovering hidden patterns in data, when we don't have any labels. The most common unsupervised learning method is cluster analysis, that is used to group data according to similarity. Some practical applications of clustering include social network analysis, document classification, rideshare data analysis and customer or market segmentation.
The most popular clustering algorithm is K-means. The goal of the K-means algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided.
Market to customer segmentation is a key component in any business strategy. In the follow article I will show you how to deploy a market segmentation machine learning model for D365FO:
1. Data Preparation and Loading Data into Azure Machine Learning
1.1 Export data from D365FO
In this case we will use the Sales and Marketing report in customer statistics – Top 100. (For this example, we are using the standard demo data)
1.2 Left the parameters by default
1.3 Export the report as a CSV file
1.3 Import data into Azure Machine Learning (Import the data set)
Now we will explore Azure Machine Learning Studio and build a clustering model using K-Means Clustering.
1.4 Go to https://studio.azureml.net
1.5 If you have signed into Azure ML previously then log in, if not sign up for an account (Free Workspace).
1.6 On the lower left corner, click New.
1.5 Go to the Dataset tab and select FROM LOCALFILE.
1.8 Upload the Top 100.csv from your computer.
2. Build an Azure Machine Learning experiment
2.1 Once the upload is completed on the lower left corner, click New.
2.1 Go to Experiment tab and select Blank Experiment
2.1 Drag Top 100.csv onto the canvas.
In the upper left corner, expand Saved Datasets and select My Dataset.
2.1 You can visualize the data by doing a right-click on the small circle below the module and click Visualize
2.1 You can then review the data. Look for the number of columns, number of rows, and what the data looks like for each of the rows.
2.1 Drag the Normalize Data module onto the canvas. As part of data preparation, we need to Normalize or put it at the same scale the data and reduce the noise.
Connect the output of the Top 100 data set into the input of the Normalize Data module.
2.7 Set the Clean Missing data module to the following properties:
Zscore converts all values to a z-score. The values in the column are transformed using the following formula:
2.8 At the Launch column selector, select the follow columns
2.0 Drag two Train Clustering Model module to the canvas and the K-Means Clustering module. Connect the modules as follows and select the Launch Column selector:
2.10 At the Launch column selector, select All columns
2.11 On the left side K-Means Clustering module change the following properties:
Create trainer mode: Single Parameter
Number of Centroids: 4
Random number seed: 12345
Note: The K-means++ algorithm is a variation of the standard K-means algorithm output from our algorithm to use smarter initializations. The K-means++ also has the potential to reduce the total running time. The clusters are modelled using a measure of similarity which is defined by metrics such as Euclidean or probabilistic distance. In this case we will select the Euclidean distance between points p and q, which is the length of the line segment connecting them. The numbers of Centroids represent the number of clusters. The Random seed is used to generate random numbers to initialize the centroids.
2.12 Drag two Convert to CSV modules to the canvas. Connect each Train Clustering module to each Convert to CSV module. Run the experiment by clicking the Run button. Your full experiment should now look like this:
3. Evaluate the Evaluate Clustering
Note: Clustering Evaluation is typically tricky because there's rarely any ground truth information that we can use for testing. So how do we evaluate whether clustering is good?
The short answer is no one agrees. But the longer answer is that researchers have developed several useful heuristics. The first, and perhaps most useful observation, is that we are often trying to find some meaningful latent pattern in our data via clustering. The most important thing is to determine if the value of K (the number of clusters) is optimal. To do that we use the elbow method that plots the cost of J and the number of clusters K. The cost function should reduce as we increase the number of clusters, and then flatten out. Choose K at the point where the cost function starts to flatten out.
3.1 Visualize the Train Clustering Model results dataset
Note: The ellipses that represents the clusters are almost perpendicular, and
the lengths are quite different as well as this indicates the separation between
those clusters are pretty good. On the other hand, if we see an overlap between the two ellipses, this indicates that we have a poor separation of the data into the clusters.
4. View the results of the experiment and create Jupyter Notebooks
4.1 From the Azure ML Workspace Right click on the right side Convert to CSV and under Results Dataset choose Results Data Set -> Open in new Notebook. Choose Python 3.
4.2 Once the Jupyter Notebook opens click on Run. In the In: line type the following: Frame.
4.3 Click Run. You should see the following:
Note: The Assignments show the cluster that every data row was assigned.
4.5. To plot the cluster, type the following in the next In:
5. Beyond Azure Machine Learning studio
If you want to plot the cluster in different colors and run the Elbow method, please use the follow repo at github:
Business Applications communities