We are experiencing rapid growth of our database storage. The capacity report shows that a majority of storage is consumed by table ActivityPointer. Our analysis of the data in that table shows that the data is mainly in the description field of email activities. On average, one email has a size of about 200 KB. Analysis of a few samples indicates that most of the size is caused by mostly unnecessary html code.
Our plan to mitigate the storage growth is to remove the unnecessary html code and just keep the parts that are actually necessary to display the email in the intended layout. A proof of concept with Tidy HTML shows promising results and should reduce storage needs by about 70%.
Is there a third party solution or product available that does this or something similar? Bonus points if the solution offers archiving of the original email, adds compression for the content of the description field and/or provides telemetry on number of records processed and data reduced/saved.
If no third party solution is available, we plan to implement this ourselves. Current approaches are using an Azure Function that gets triggered by the Azure Service Bus integration of Dataverse or a Power Automate flow. Alternatively, a custom workflow activity could be implemented that does this, thus removing the need for additional Azure components.
Appreciate any thoughts or preferences for one of these approaches.