Since the release of Workflow 2.0, we in Support have seen a broad range of environments in which the new Workflow engine is running. We have come across a few issues that I’d like to share with the community regarding Workflow 2.0 performance in multi-domain environments. 

 

Issue 1: General Workflow performance in multi-domain environments

 

The first issue I want to highlight is that if you have users from other domains added into a Workflow as approvers, assignment to these users will take longer than assigning to a user in the same domain as GP/SQL. The time will increase based on several factors, such as having a large number of trusted domains, complex OU structure, disabling NETBIOS name resolution for child domains, etc. Networking could play a part in a slow workflow if the calls to the Domain Controller where the users actually reside take longer than calls to the local domain. Physical location of the DC’s can be a factor, but most of the time we see just complex networks. Flat networks, but complex. The exact nature of the performance slowdown varies, I’ve seen complex environments hit 20 minutes for approval, or it may only add a few seconds to the workflow process.

 

Understanding why the approval process is slow starts with understanding how GP is querying Active Directory for the user information. If the Workflow engine tries to assign a document to a user in another domain, the code initially looks that user up in the local machine’s domain context. If the user is not found, then the logfile is updated, and additional code is triggered to perform a broader search against all domains that the current domain trusts. A call to the DC is made to get a list of trusted domains, and the search extends to all these results. If there are a large number of results (trusted domains) in this query, working through the list will take time, anywhere from a few seconds to a few minutes depending on how many results are being returned. The rest of the post helps detail this process, but it needs to be noted that there will be a performance slowdown regardless of whether or not NETBIOS name resolution is enabled or disabled for child domains.

You can expect to face workflow performance slowdown if the following conditions are true. The extent of the slowdown varies based on the number of domains you have.

1. You are assigning workflows to users that reside in a domain other than the one that the GP/SQL machines are joined to.

a. This will be exacerbated by having a large number of trusted domains. Even if the workflow is only trying to assign to users in a single trusted domain, ALL trusted domains will be searched.

- This will be further exacerbated if NETBIOS name resolution of the child domain is broken.

 

Issue 2: NETBIOS name resolution for child domains adding a performance hit

The other issue that I want to detail is regarding NETBIOS name resolution of Child domains. If NETBIOS name resolution for the Child domain doesn’t work, then you will face slight delays on each AD call that the workflow engine makes. The default expected behavior is that while on a computer joined to contoso.com, you should be able to ping CORP and get back the IP address of DC03.However, to be able to speak to this, we need to go into some more technical details about multi-domain environments. I’m going to use terms that assume a working knowledge of Active Directory. If you’ve ever stood up a domain before, you should be able to keep up. If you have not stood up a domain before, well, this is the internet. I’m confident you can find out how to do so.

 

Imagine for a moment we have three domains. Contoso and contoso2 have a two-way forest trust and DNS zone replication set up. Corp.contoso.com is a child domain of the parent, contoso.com. SQL and GP are installed to machines joined to the contoso.com domain. DNS zones are replicating so any object or user can be found in any other domain. Although there isn’t a direct trust between corp and contoso2, the fact that both the other trusts are transitive means that corp has a trust with contoso2 by way of the two-way trusts between contoso and these domains.

In it’s default configuration, a domain installed like this will allow machine from any of the other domains to be located via NETBIOS or FQDN, even if the machine that’s making the query isn’t on the same domain as the object resides. The request goes to the machine’s DC, then the DNS request is forwarded on to the other DC’s that it trusts. For example, if I’m on a machine that’s joined to contoso.com, and try to ‘ping’ the FQDN of a SQL server joined to corp.contoso.com, this request will make a round trip through the other domains to return the resource we’re requesting, from the domain where it actually resides. NETBIOS name resolution means I could ping the NETBIOS name of my CORPSQL server, and get back the same IP address as if I had pinged the FQDN. See the note at the end of this post for a little more information regarding how NETBIOS name resolution is set up. I’m saying ‘default’ a lot because if this doesn’t work, it usually means additional steps have been taken to disable this.

NETBIOS – CORPSQL

FQDN – CORPSQL.corp.contoso.com

In this default state, pinging the NETBIOS name of contoso or contoso2 does not return the IP of the DC’s for those domains. The only thing we need to focus on is the NETBIOS name resolution of the child domain, corp. This is the only place we found any problems.

 

Here are the errors you will receive if NETBIOS name resolution doesn’t work for child domains. Imagine we have a user in contoso, submitting a document to be approved by a user in corp.contoso.com

 

Upon submission, the following error will log in the temp directory of the submitter’s profile. This will be logged anytime the user search needs to extend beyond our current domain context:

ActiveDirectory.GetUserObjectByDirectoryEntry - There is no such object on the server.

ActiveDirectory.GetUserObjectByObjectGuid - Unable to retrieve user from the DirectoryEntry object, performing fallback logic to lookup the object across multiple forests.

These messages are expected. It means that the approver was not found on DC01.contoso.com, so the extra code is now going to fire to find the approver across trusted domains. The next step gets a list of all trusted domains in which we are going to perform the search. In our example, we know this will be both contoso2 and corp. Several seconds later, you will see the following messages on in the Workflow log in the temp directory of your SQL server service account:

ActiveDirectory.GetNetbiosDomainName - The server is not operational.

ActiveDirectory.GetNetbiosDomainName - The server is not operational.

ActiveDirectory.GetNetbiosDomainName - The server is not operational.

ActiveDirectory.GetNetbiosDomainName - The server is not operational.

 

Overall, the approval will go through and you shouldn’t receive any errors, but this extra lookup code will cause additional performance overhead to the approval process.

The other scenario to consider is what happens if you submit a document to contoso2 from the context of contoso. The workflow will still work, but we get some new errors and additional delay.

Upon submission, the following will log in the SQL workflow log with NETBIOS name resolution for child domains disabled. The first two messages are again expected, since we’re assigning to a user in another forest.

ActiveDirectory.GetUserObjectByDirectoryEntry - There is no such object on the server.

ActiveDirectory.GetMembersList - Unable to retrieve user from the DirectoryEntry object, performing fallback logic to lookup the object across multiple forests.

ActiveDirectory.GetTrustRelationships -   Error processing the domain trusts for corp.contoso.com : System.DirectoryServices.ActiveDirectory.ActiveDirectoryServerDownException: The server is not operational.

Name: "corp.contoso.com"

 ---> System.Runtime.InteropServices.COMException: The server is not operational.

   at System.DirectoryServices.DirectoryEntry.Bind(Boolean throwIfFail)

   at System.DirectoryServices.DirectoryEntry.Bind()

   at System.DirectoryServices.DirectoryEntry.get_AdsObject()

   at System.DirectoryServices.PropertyValueCollection.PopulateList()

   at System.DirectoryServices.PropertyValueCollection..ctor(DirectoryEntry entry, String propertyName)

   at System.DirectoryServices.PropertyCollection.get_Item(String propertyName)

   at System.DirectoryServices.ActiveDirectory.PropertyManager.GetPropertyValue(DirectoryContext context, DirectoryEntry directoryEntry, String propertyName)

   --- End of inner exception stack trace ---

   at System.DirectoryServices.ActiveDirectory.PropertyManager.GetPropertyValue(DirectoryContext context, DirectoryEntry directoryEntry, String propertyName)

   at System.DirectoryServices.ActiveDirectory.Domain.GetDomain(DirectoryContext context)

   at Microsoft.Dynamics.GP.WorkflowGP.WorkflowEngine.Directory.ActiveDirectory.GetTrustRelationships()

 

We can see that although our approver isn't in corp.contoso.com, this domain is still queried because it is a trusted domain. These types of errors will be expected for EACH domain that returns in the 'trusted domain' list. Re-enabling NETBIOS name resolution of the NETBIOS name of your Child domain will cause these errors to stop logging, and should speed up the process a bit.

 

NETBIOS name resolution notes:

NETBIOS name resolution starts on the NIC of the server making the request. In the Advanced TCPIP4 settings, there is an option to append primary and connection specific DNS suffixes. This means that if I just ping CORPSQL, my NIC is going to automatically append my connection suffix (which is contoso.com) to the NETBIOS name of what I’m looking up and start the search there. In a complex environment, the connection suffix may not match the domain suffix of the user/computer’s actual domain, so additional DNS configuration would be required. However, at a basic level, NETBIOS name resolution usually works simply because your NIC is adding ‘consoto.com’ to the end.

The only way I was able to find to break NETBIOS name resolution for my Child domain was to append a bogus suffix, to force the lookup against an invalid domain. I wasn’t able to find any other simple way of doing this. This information would be relevant most where the machine connection suffix is for a different domain than our users or computers exist in, which could require an additional domain suffix be added to make sure the correct domain is queried.

 

In summary, we wanted to get this information out there to help the community understand why the engine is behaving how it is. Given that our GP lookups rely on the System.DirectoryServices library, some of these behaviors are happening simply because of how this DLL processes lookups. GP Workflow is absolutely supported on multi-domain environments, but we wanted to be transparent about the types of issues that we’ve come across when working with these environments. Like I said previously, some environments report approval times of up to 20 minutes for complex setups, and the only way of truly working around some of these scenarios is to have all workflow users and computers in a single domain. 

 

Work on this issue is ongoing, and as always this post will be updated if we get more information.

 

Until next time!