Waves crash due to SPID reaching max capacity

(0) Share

Report

Posted on by Solozmar

796

Long time no talk AX Community (and apologies if this is showing up twice, I hit post and it disappeared).

I am wondering if anyone out there has seen the below situation We are running AX 2012, CU11 with Advanced Warehousing enabled. Waves are set to process in batch and parallel allocation is enabled on the allocatewave method and is set to 6 (We tried different variations and six seemed to avoid item allocation issues).

Several weeks ago, when waves were released, users started getting the error in screenshot 1 - "Error accessing database connection". If I ran the wave, it would run. Then, in following days, the same error would occur but then it wouldn't run for me, unless I disabled parallel allocation on the method. Then it started failing and wouldn't even process if I disabled the parameter in warehouse management, to process in batch (it seems to process now, but only if done through a client session).

Similar errors throughout the process, but only in the AOS, "database connection errors, communication link errors, fatal sql login, etc. No errors in SQL or in AX. Only in the event logs of the batch AOS.

The thought was tcpip port shortage and following an article on Technet, we set the tcpip to a 30 timeout in the registry and a restart. That seemed to work but four days later, the error returned.

After some poking around, I discovered that when the batch job would run, SQL server SPIDS would skyrocket from <500 to maxing out at 32,767 (I captured it below, after it had started draining. It then very slowly drains them (1-4 per second)). They're in a sleeping state status, with command of awaiting command. On a test server, it exhibits similar behavior, except that while it generates thousands of SPIDs, it seems to be reusing them, vs PROD, where they just keep generating new ones.

In the meantime, I've asked a coworker to try this out in a Contoso instance to see what the SPID behavior is.

*This post is locked for comments

I have the same question (0)

All responses (4)

Answers (1)

Verified answer

Guy Terry 28,924 Moderator on at

Like (0)

Report

I probably don't have any answer for your problem, aside from vaguely suggesting that this 'newer than CU11' hotfix might help: KB4024685 Multithreading induces issues during Automatic release of sales orders by using the Wave Processing logic.

However, I did want to say that 'WHSWorkCreateHistory' is the Work creation history log. Would it be too much to hope that turning off 'Create work creation history log' (in Warehouse management parameters) would solve your error?

Was this reply helpful? Yes No
Suggested answer

Ivan (Vanya) Kashperuk on at

Like (0)

Report

I think Guy is right, just not sure if that KB pulls the right dependencies in.

The specific issue with SPIDs being consumed too much is due to UserConnection not calling finalize after use, and the wave processing flow creates a bunch of separate user connections.

You can check a few of the places, like

WHSPostEngine::createWaveExecutionHistoryLine to see if finalize is called - since you are on 6.3, there's no finally support yet, so it's done through some SysConnection tracker

Was this reply helpful? Yes No
Solozmar 796 on at

Like (1)

Report

This one was interesting. It came down to the work creation history log not being the root cause but just that we have far too many Location Directives and Work templates and with the sequencing set the way it was, the number of failures were immense and spawned thousands of threads.

Particularly on replenishment where not all of our items yet have fixed locations, so they are directed to a location profile type that contains over 100 "dynamic" locations, so can have hundreds of failures there alone, on dozens of items.

Was this reply helpful? Yes No
lennartc 70 on at

Like (0)

Report

We have also discovered issues on older versions where the connection in WHSWaveStepController::createControlRecord was not finalized which could leave the sessions open.

Adding the connection.finalize() after the ttsabort and ttscommit on customer version of WHSWaveStepController::createControlRecord.

Was this reply helpful? Yes No