We have a workflow that has a step where it looks at a date and waits for 3-days after that date is set. After 3 days, the workflow picks up and continues. We don't tend to have any issues with this workflow and have been using it for years, probably 5 to 10 years at this point. But a few days ago about 75% of these Workflows crashed with an error... /An unexpected error occurred from ISV code/. Nothing was going on at that time that we know of. It was at 9:52PM. There might have been some software being installed on the server at this time and it may have been rebooted. But even so, that shouldn't have killed all of these workflows... right?
The timeout event was not due to occur from another 3-days. So it was not an error due to the timeout occurring and the workflow picking up and failing. The ISV error, that is typically a plugin failing, right? But there shouldn't be a plugin running while that workflow is in a paused/wait state. What if the record that the workflow is monitoring the date on, what if that record was deleted or changed? Could that kill the workflow that was paused?
Any idea what could cause something like this? It seems like something picked up at 9:52PM, maybe some process, service, or something where CRM checks the status of these? Maybe that failed and crashed these? No idea. Just trying to think it through. If these workflows are not stable enough to survive a reboot, services being restarted, software updates, misc. failures, etc. It's probably not a valid solution for us. We need something more along the line of a queue where if things fail, once they pick back up again, they will continue to process. I assume that is how these Wait/Timeout steps in a workflow work, but I don't really know. But either way, it seems that something can kill them.
Any thoughts are suggestions are much appreciated.