Just wondering if anyone else had hit this.
We have a plug-in that calls a custom web-service. (The web service is our own, developed in MVC4 and deployed on a VM in Azure, say.)
The call is made using HttpClient and Postasync. (Since you can't do async in a plug-in, there's a ".Result" on there to make the thread wait.)
This all ran perfectly reliably in several Dynamics 365 Online CRM tenants / orgs for several months.
Mid-December, something must have changed somewhere because without our having touched anything we started getting intermittent exceptions in the plug-in. These are network-stack errors and they basically all mention Overlapped I/O and something not being in a signalled state.
An example stacktrace might be...
System.AggregateException: One or more errors occurred. ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a receive. ---> System.IO.IOException: Unable to read data from the transport connection: Overlapped I/O event is not in a signaled state. ---> System.Net.Sockets.SocketException: Overlapped I/O event is not in a signaled state
... though sometimes it's slightly different and complains that the client closed the connection. (I've also seen a server-closed-the-connection version, but I'm pretty sure that was only after I'd tried tweaking my code.)
The *server* doesn't always log the error in its IIS log but when it does it's a "500 0 64" error. I've done a couple of failed request tracing traces and they suggested that the client was closing the connection before all the data had been received.
It looks to me like either Microsoft have changed something on the CRM end so that the Postaync request or its underlying Windows network connection is getting forcibly killed off or disconnected almost as soon as it starts, or else they've changed something in the network and we've got some switch or firewall killing things.
We've tried optimising and changing the connection options available to us via the HttpClient and they don't seem to make anything any more or less reliable.
We've mitigated for now with a retry loop - if we keep trying, eventually the call succeeds - I've seen it take up to about 100 attempts (!) though usually it's < 10. (The web-service call actually takes a number of seconds to process when it works successfully - it's connecting on to SharePoint - but the failures all seem to happen after only a small number of milliseconds, so 100 retries doesn't actually add too much to the time.)
I've got a case open with Microsoft, but just wondered if anyone else is using HttpClient and having problems only recently. (Just remembered, we did actually use it in another, simpler call and experienced intermittent unreliability there too. I *think* switching that call to use WebClient has fixed things, though it's not been heavily tested and I still saw a "Task Canceled" error that I haven't got to the bottom of yet.)
Feel free to comment if you've hit the issue too!
Thanks.
MWM