In the first post in the “Behind the scenes of a cloud service” series, which you can read here, we described the overall service topology of Business Central in the cloud, and how at the highest level it consists of 3 categories of services:

  1. Global services
  2. Regional control plane services
  3. Regional data plane services

In this post, we go into more depth with the regional data planes, in particular with the compute tier that runs the actual business logic.

 

Scale units: each data plane cluster is a scale unit

A data plane cluster in Business Central is a self-contained set of mainly compute resources (VMs etc.) on which we can host multiple customers. When new customers sign up for Business Central, we assign them to existing clusters.

Over time, as more customers sign up, we eventually run into some bottleneck on a given cluster. For example, the CPUs on the cluster’s VMs may start maxing out, which we can solve by adding more VMs to the cluster, or by using larger VMs in the cluster. We might also start to see the App DB becoming a bottleneck, which we can solve by scaling up the database. This approach of making the clusters bigger and bigger can be considered a “scale up” approach, and it works well in many cases.

Some scale issues are more easily addressed using a “scale out” approach. Using this approach, we add new clusters instead of increasing the size of existing clusters. Since clusters are independent of each other by design, there are no shared resources that can become bottlenecks. Thus, a “scale out” approach in principle lets us handle infinite scale (of course, at the end of the day, there will be other bottlenecks, like the capacity of Azure, or the capacity of the DevOps teams that operate the service).

The data plane clusters thus become the “scale units” of the Business Central service.

There are other benefits of the “scale out” approach. One benefit is that the data plane cluster becomes a fault isolation unit, i.e., if one cluster fails in any way, it doesn’t impact the others, because clusters are completely independent from each other by design. Another benefit is that it allows us to roll out changes in phases, e.g., we can roll out an NST hotfix to one cluster and observe it for a while before we roll it out further.

 

VM scale sets and Service Fabric

Azure supports a concept of “VM Scale Set”, which makes it easy to create a set of identical VMs, as well as add more VMs or remove some VMs as the load changes over time. Each cluster in Business Central typically consists of 5 VMs. It is our goal to have much larger clusters in the future, and we are working to address some algorithmic bottlenecks that currently prevent us from having many VMs in a cluster.

While VM Scale Sets make it easy to create many VMs, these VMs are just plain, empty Windows Server VMs without any Business Central components on them. To help us install the BC services on the VMs, we use Azure Service Fabric, which is a so-called microservices orchestrator. Service Fabric makes managing services and clusters much easier than if we were to do it “manually”. For example, if the VM Scale Set replaces a VM with one that runs a newer Windows Server version, Service Fabric will automatically – without us having to tell it so – install the relevant BC services on it.

With Service Fabric, we provide our microservices – such as the NST or the Web client – as zipped package files (.sfpkg), which contain both the binaries as well as “recipes” for how these binaries should be installed.

As an example, the Web client is packaged into a 52MB file called NavWeb.sfpkg. This file contains all the binaries of the Web client, such as Microsoft.Dynamics.Nav.Client.UI.dll. It also contains so-called “manifest” files that Service Fabric understands, which specify that the Web client should be installed on all VMs and should listen on port 443.

All microservices are installed in a similar way. We will look more at the individual services further down.

 

Networking and request routing

Each data plane cluster has its own Azure Virtual Network, which enables the services within the cluster to talk to each other.

By default, the virtual network is a “closed system” in the sense that it’s impossible for external parties, such as a browser on a user’s machine, to connect to the virtual network and to the services that run inside. This is obviously an important security feature.

Some incoming connections are desirable, and to enable those, we add an Application Gateway load balancer, which maps between an external IP address and the internal resources on the virtual network. The Application Gateway is a so-called Layer 7 load balancer, which means it understands the HTTP requests/responses and utilizes them to do more intelligent routing. Specifically, the Application Gateway adds an HTTP cookie in all HTTP responses, and if the client sends that cookie again in a new HTTP request, the Application Gateway recognizes it and routes the request to the same VM as before. This is important for Web client scenarios. The Web client is stateful so multiple requests from the same browser session must be handled by the same Web client instance.

For technical reasons, specifically to support non-HTTP traffic on the NST management endpoint, we add a second load balancer. This is the standard Azure Load Balancer, which is a so-called Layer 4 load balancer that simply forwards network packets without inspecting them.

Finally, to increase security, we add a Network Security Group, which we use to allow/deny certain network traffic on the virtual network. For example, we deny all network traffic that targets port 3389, which is the Remote Desktop Protocol port.

With these elements, the data plane cluster looks like this:

Data plane clusters contain some additional resources that we haven’t shown in the picture. One example is a storage account that collects log files.

 

Microservices: NST, Web client, and many more

You are familiar with the NST and the Web client, but in the cloud version of Business Central, we run more than just those two services.

This picture shows the set of services that run in the data plane clusters today:

Here is a short description of each microservice:

  • Web Client: The well-known service that serves UI to browser applications as well as to the mobile applications.
  • NST: The well-known service that executes C/AL and AL code.
  • Licensing Service: A service that can check user licenses in AAD (see also here).
  • Tenant Directory Service: A service that keeps track of the tenants that have been assigned to the cluster.
  • Monitoring Agent: A service that collects telemetry from the cluster and sends it to a central location, where it can be used for analysis and troubleshooting.
  • Extension Proxy Service: A service that assists with operations such as a user installing an extension in a tenant.
  • Hybrid Proxy Service: A service that is used in the Intelligent Edge scenario, and which assists with setting up replication between on-prem systems and Business Central.
  • Delta Service: A service that enables delta queries against the OData endpoint (see "poll-based change tracking" here).
  • Browser Client: A service that serves static UI assets such as JavaScript and HTML.

These descriptions obviously don’t do the services justice, but hopefully they give an idea of how the service is composed of a set of microservices that have specific responsibilities.

We expect to add more microservices over time, and we are currently putting the final touches on two new services that will route requests more intelligently in the cluster, leading to a better performance for end users. These two services are called Gateway Service and Balancer Service. We will go into more depth with these two services in a subsequent blog post.

While the cloud version of Business Central is deployed as a set of special-purpose microservices, our goal is to keep the on-premises version of Business Central as simple to install and manage as always. In other words, we will still have the NST and the Web Client as the only two services, and they will come with all necessary logic built in. Similarly, we will continue to provide Docker container images that are as simple to run as always.

 

We hope you find this information interesting, if not directly useful. Please let us know your feedback and questions!

And stay tuned for the next post in the series, where we will go into depth with how we store customer data in various configurations of Azure SQL!