Remove Potential Duplicates

Matthew Jaimes $organization over 2 years ago

In going through the Merge Duplicate process, there are some constituents that were auto identified as potential duplicates but we have determined to not be duplicates. Does anyone know a way of individually removing these constituents from the potential dups window at the top of the merge screen? I'm currently left with a bunch of constituents that were identified as potential dups but are not and have to sift through them all every time I go to schedule merges.

Gawain Lavers $organization over 2 years ago

There is not a clean way of doing this in Tessitura. I've created an Association ("Not a Duplicate") to connect two accounts that are confirmed not duplicates, but you then have to have some kind of system to strip those matches from your Potential Duplicates list.
Cancel
Vote Up +2 Vote Down

Sign in to reply

Cancel
John A. Moskal II $organization over 2 years ago

Gawain Lavers is correct. Interestingly, though, there IS a custom parameter set up for the duplicate identification process. But it is set to nothing. I think maybe there is intended to be another local procedure for custom identification? Or maybe there was the idea there at some point and it got shelved? At any rate, in our database at least, in the procedure that identifies potential merges (AP_IDENTIFY_DUPLICATES) there DOES exist an entry variable called @identify_method_custom which is set to -1, which matches the ID on the table TR_DUPLICATE_IDENTIFY_METHOD which then says "Custom Potential" that was created back in 2012.

But I have neither heard nor seen anything of that since then. So maybe that died a quiet death.

John A. Moskal II
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
John Trimble $organization over 2 years ago

I look for the root cause of the identifications first, then use Void Merge. Typically the potential dupe process looks at fname, lname, street1 and zip of primary address (and sometimes primary eaddress), but not necessarily all the characters in those fields. Sometimes I find something the logical process identifies as a dupe that can be removed or altered slightly, thus remove them from identification at the source. For example, someone at some point misidentified one patron as the other and put in some data that doesn't go with that similar-seeming patron. Beyond that, we use void merge, even though it could prevent authentic dupe merges, which obstruct the merge process rather than remove from contention. Duplicates are a really big problem, but we don't throw many false ids (that we've caught) in our system. That said, we run the merges past members at all the consortium orgs for review first, which from time to time has caught deliberate dupes for some organizational purpose.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Gawain Lavers $organization over 2 years ago in reply to John A. Moskal II

There's no built in (e.g. LP_IDENTIFY_DUPLICATES) plan for adding your own entries to the Potential Duplicates list, but I use that method id for my custom systems for adding Potential Ids (and auto-scheduling merges). It happens to have key relevance to how household affiliates are processed (yes, that took me days of digging through the code).
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Gawain Lavers $organization over 2 years ago in reply to John Trimble

The problem with Void Merge is that it shuts off any merging with that constituent record, even if it is a different, and correct, merge target. I had created a separate attribute where you would enter the customer number of the customer not to be merged into, then added handling in the LP_CUSTOMER_MERGE (or wherever) to block merges with either the keep or delete had that attribute for the other. This required, of course, that this attribute be added to merge postprocessing so that the id would be updated if the selected customer was merged into a different account. I only recently realized that an Association was the perfect way to map and maintain (or rather let Tessitura maintain) this relationship.

The other problem was that the Attributes are only invoked after the customer accounts have been scheduled for a merge, generating an error, and guaranteeing that the same false identification will occur next time. I built a utility to run after a Potential Dup Identification run that would clear out any accounts I didn't want in there, but have recently completed (though not yet deployed) a new system for updating T_POTENTIAL_DUPS with either suggested duplicates or scheduled merges (where I have maximal confidence).

This has been delayed a bit, as I need to do two things: 1) really sort out how I'm going to handle Guest Checkout accounts and 2) build a proper report/utility frame around it.
Cancel
Vote Up +1 Vote Down

Sign in to reply

Cancel
Kevin Sheehan $organization over 2 years ago

There is a parameter on the identify duplicates procedure that is intended to control when duplicates that were identified but not merged end up back on the list. The parameter is @changed_since_days. When set to NULL, a pair of potential duplicates that is not merged will not be added to the list of potential duplicates again unless one of the records has had activity since the last time the merge procedure ran. If you set it to a number, a pair of potential duplicates that is not merged will not be added to the list of potential duplicates again unless one of the records had activity in that many days.

Since this is looking at activity on an account, it's not foolproof, but it should keep most potential dupes that you don't merge off the list. I would check what the parameter is currently set for at your organization. Sometimes it gets set to several years, particularly when you first start running the procedure to find potential dupes, so that it grabs everyone. If your list of dupes is reviewed pretty regularly, you can set it to Null or a smaller time frame. Here's a link to the help topic that describes this parameter: www.tessituranetwork.com/.../Identify-Duplicates-Procedure.htm
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
John A. Moskal II $organization over 2 years ago in reply to Kevin Sheehan

I cannot speak for anyone else, but a number of our accounts that look enough like each other that they end up on the suggested duplicates list are businesses and patrons with whom we are regularly doing business. So they do regularly appear on our suggested duplicates list even when not merged since the accounts are continually having activity. For the record, we run ours with that value set to 9999 (and we have not been in existence for even half that long).
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
John A. Moskal II $organization over 2 years ago in reply to John A. Moskal II

It is possible I am mis-explained my response to Kevin Sheehan's post. My understanding of the @changed_days_since is that, assuming it has the value of X, if the patron activity on the patron's account is within the last X days, it will get suggested again. If it has not had activity in the last X days, though, it will not. So yes, a large number (such as we use), would continually keep pulling all accounts regardless. Which makes sense. But, even if we turned that into a smaller number (which we do not want to do just yet as our database is still actively being cleaned and de-duped to the point where we want to keep finding everything), there are still a number of accounts that would appear on the list every month. Additionally, we do not want to keep that number TOO low because we often just cannot get to all of the suggested merges at one moment in time. So we need the list to remain at least somewhat fully regenerative.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Matthew Jaimes $organization over 2 years ago in reply to Kevin Sheehan

This is super helpful Kevin, thank you! While this doesn't fix my issue directly, I see how Tess likely viewed this field as the solution to this issue. Once I can actually clear out all true dups from the list, we should be able to update this field to 7 days so we'll have a full week to review and schedule real dups and we can leave any real people to be removed from the list when it refreshes every week. Not an ideal system in my opinion but it will work. Much appreciated!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Chaz Stuart $organization over 2 years ago in reply to Gawain Lavers

Late to the party here but we're addressing this topic in real time. Association is a great idea, but we went with a simple attribute with the reciprocal patron id. I created an independent procedure that looks at T_POTENTIAL_DUPS and IDs anyone with the attribute and the value is the reciprocal ID on the table they get stripped from the table. We're very early in setup and testing but the idea is that we'd schedule it to run after the identifying procedure so that we can turn it off if needed. Something like

CREATE TABLE #Results (
ID INT,
P_DUPE INT
);
INSERT INTO #Results (ID, P_DUPE) /*creating a temp table that matches customer IDs with potential dupes*/
select distinct d1.customer_no, d2.customer_no as p_dup from T_POTENTIAL_DUPS d1
join t_potential_dups as d2 on d2.criterion = d1.criterion
where d1.customer_no <> d2.customer_no

delete from T_POTENTIAL_DUPS
where customer_no in
(select a.id from #results a
join tx_cust_keyword b on b.customer_no = a.id
where b.keyword_no = 798 /*The False dupe keyword ID*/
and b.key_value = a.p_dupe
)
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel