Understanding your current state
Volume: ~1 million contacts, many duplicates.
Root Cause: Historical imports without validation.
Constraints: No third-party tools, must use low-code (Power Automate, OOB features).
for maintain uniqueness :
Consider adding Alternate Keys for Email or Mobile to enforce uniqueness going forward.
Use Exact Match for Email and Mobile, and Fuzzy Match for names if possible.
Deduplication Approach
Since you cannot use XRM Toolbox or external tools, recommened approach would be power automate or data flows:
Option 1: Use Bulk Merge via Power Automate
Create a Power Automate flow:
Trigger: Scheduled or manually.
Action:
Query contacts using List Records with OData filters based on your duplicate detection logic.
Group by Email or Mobile.
For each group:
Keep the oldest or most complete record.
Merge duplicates using the Dataverse Merge action (available in Power Automate).
Log the merge in a custom entity or Excel file for auditing.
Option 2: Use Dataflows for Preprocessing
Use Power Query Dataflows to:
Pull all contacts.
Apply grouping and deduplication logic.
Push cleaned data back into Dataverse.
Automation
you can schedule Power Automate flow to run during off-peak hours.
Use batching to avoid hitting API limits (process in chunks of 5,000 records).
Auditing & Logging
Create a custom entity called “Deduplication Log”:
Fields: Original Record ID, Merged Record ID, Date, User.
In your flow, after merging, create a log entry.
Alternatively, you can write logs to Excel in OneDrive for easy reporting.
Preventive Controls
Enable Duplicate Detection Rules for:
On Create and On Update.
Use Alternate Keys for Email and Mobile.
Train users on proper data entry.
If imports are still happening:
Use Data Import Templates with duplicate detection enabled.
Challenges
Performance: 1 million records will require incremental processing.
Merge logic: Decide which record to keep (oldest, most complete, etc.).
API limits: Use pagination and throttling in Power Automate.