I just finished a project that involved merging about 6,000 duplicate contacts in CRM. When the client initially rolled-out CRM (before we became their partner) there was lack of knowledge on how to best enter contacts and accounts. This led to thousands of orphaned contacts and thousands of duplicates.
Although not particularly creative or easy, the approach we took was to first copy contacts and related records to SQL Server. This allowed for matching using various joins and fuzzy matching on names and companies where needed. Part of the analysis was to determine which contact needed to remain active after the merge. There were business reasons why a particular contact needed to remain active (e.g., the contact details were also in an ERP system).
I created a stored procedure to perform the duplication analysis and write the records to be merged to a SQL table. Columns include SourceContactId (guid), TargetMasterContactId (guid), MergeAttempted (datetime), Merged (datetime), and Error (ntext).
Next, we reused a .NET console application that we wrote some time ago to loop through the records in the merge table and perform the merge. That code had to build an Entity with data to map across from the subordinate contact to the master record. It then used the MergeRequest class in the CRM SDK to perform the merge and updated the SQL table with the merge status.
We write a lot of our own tools at Altriva, but there might be a pre-built tool that can handle the duplicate identification and merging.
Let me know if I can help further with this. It's not an easy task but hopefully I've provided some guidance on how you can approach this.