Does anyone have code or a process they are willing to share for identifying records that aren't the same and shouldn't show up in the merge window? We stay pretty on top of our merges and we are now at a point where 80% of our suggested merges are records that appear similar but aren't and shouldn't be merged. I would love to have a way to identify these records as not being the same so they stop showing up in the window making it harder to see the actually records that need merged. Any help would be appreciated.
I have an attribute where you can specify a customer number to never merge into, and a pre-merge check for it. Also, I have a script that runs after our potential dups process that does a number of things, but one is to purge any such rows.
Yep, we use a void merge attribute. This will be excluded from the results and will never be merged.
The standard Void Merge prevents the constituent from being merged with anyone, which was not particularly useful to us. The attribute I create specifies a particular constituent that the owner of the attribute shouldn't be merged into, allowing for other merges.
Gawain, I think this is what I'm looking for. Would you be willing to share with me?
Some logic I wrote into our dupe finding procedures asserts that if two constituents have a relationship with one another, they cannot be dupes. This helps with parent-child pairs with the same name, and also allows you to define a "Not the same person" association type to handle coincidentally named people with no other relationship.
Another approach you might look into would be to run your dupe finding on a rolling basis — only including constituent records that are new as of the last time potential dupes were generated. Then you won't have the problem of suggested merges being people you've seen before.
Similar to NIck's approach, I also have customized the duplicate identification process to remove any groups that are either on the same household or have a relationship with each other.
And likewise to Gawain, I have a custom table that I populate (via script) after going through the current list of potential duplicates that records all of those false positives along with the current date. I then have an option in the duplicate identification process to ignore these groups as well, unless either of the customer records has changed since that row's creation date. The idea behind this is that some groups aren't as much false positives as they are unsafe to assume that they're duplicates, and so until we have more information (i.e. additional activity on the record) we'll ignore them in the de-duping process.
I would also be interested in the script/attribute combination if you would be willing to share.