We are transitioning from AX2009 with SSIS to perform ETL, where each destination table in our Data Warehouse staging DB is loaded via a single SSIS Package - so in the event of a failure, we have retry logic that will attempt a retry of the failed package, and then notify if the retry also failed where we would intervene (the retry works 99% of the time).
With D365FO, we have a single Data Export Project that has several Execution Units (~25 units, ~150 total entities) that runs as a batch job nightly. Currently we're seeing an issue where occasionally one or more entities will be in a Failed or Partially Succeeded status. In every case we've seen, adding these entities to a temporary export job and running manually succeeds.
We understand that batch jobs of this nature cannot currently be retried, so we were curious how other people are handling this scenario. The whole export job takes ~90 minutes to run, so re-running from scratch due to a couple of failed exports would put us behind schedule for downstream ETL and processing.
The current recovery process (catching the errors, adding to a new job and running the exports manually, and then manually executing downstream processing) is very manual. Has anyone had similar experiences and have any ideas?