Merging Constituents Advice?

Nicholas Hudson-Ellis (Past Staff Member) $organization over 12 years ago

Hello all,

I am hoping to get a sense of how different orgs manage their merging of duplicates. I have read on the forums about some great examples from a couple of years ago of people scheduling the Identify Duplicates procedure, and having a daily task of manually putting them into the Merge queue.

Some people appeared to have been confident enough to merge constituents automatically if they meet the criteria.

I would love to hear how many people are automatically merging duplicates, and how many people are doing it manually, which is how I was always taught.

At ACMI, we have thousands of duplicate accounts from importing e-news subscribers over time. The bulk of these accounts only have First Name, Last Name and email. We are working with Tessitura consultants to get our Identify Duplicates logic to pick theses up, and would very much like to merge automatically as many as possible.

Thanks all!

Nicholas

Tom Brown (Past Member) $organization over 12 years ago

This is an ongoing issue for us.

We are not currently confident to auto merge.

If you get to a subset that you feel confident merging, please let us know what you have discovered.

--Tom
…
718.724.8135
tbrown@BAM.org

From: Tessitura Technical Forum [mailto:forums-technical@tessituranetwork.com] On Behalf Of Nicholas Hudson-Ellis
Sent: Sunday, April 13, 2014 11:59 PM
To: Thomas Brown
Subject: [Tessitura Technical Forum] Merging Constituents Advice?

Hello all,
I am hoping to get a sense of how different orgs manage their merging of duplicates. I have read on the forums about some great examples from a couple of years ago of people scheduling the Identify Duplicates procedure, and having a daily task of manually putting them into the Merge queue.
Some people appeared to have been confident enough to merge constituents automatically if they meet the criteria.
I would love to hear how many people are automatically merging duplicates, and how many people are doing it manually, which is how I was always taught.
At ACMI, we have thousands of duplicate accounts from importing e-news subscribers over time. The bulk of these accounts only have First Name, Last Name and email. We are working with Tessitura consultants to get our Identify Duplicates logic to pick theses up, and would very much like to merge automatically as many as possible.
Thanks all!
Nicholas

This message was sent automatically to you by www.tessituranetwork.com because you subscribed to the Tessitura Technical Forum. You may reply to this message to post to the Technical forum or visit the site to search, read and post to the forums. In the interest of keeping the forum posts from becoming cluttered, we encourage you to delete previous message text from your reply before sending. Thank you!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Chris Jensen $organization over 12 years ago

When we went Live with Tess in 2008 we found a large number of fairly straightforward dupes that were identifiable due to the more powerful tools available on SQL Server, and we merged those en masse.

We have not auto-merged any records since then.

Looking for address-less, name + e-mail only dupes is something we look for regularly via custom SQL code.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Former Member $organization over 12 years ago in reply to Chris Jensen

I have been involved in many discussions about this. There is an understandable amount of trepidation among department heads about the possible fallout from merging records without any human intervention. However, there have been a few cases when I have implemented it.

The one case where it has been most successful is with accounts created on the web. But we used really strict criteria. At one point we implemented a nightly process which would merge records if the following were true:

- The Delete account was created on the web the previous day.

- Both the first name and last name are an exact match

- The postal address is an exact match

- The email is an exact match (this one may have been deprecated later, I can't recall)

What we are trying to catch is existing patrons who did not previously have a web login. So when they hit the website the first time to buy tickets, they have to create an account. We are then grabbing those new accounts and merging them into their existing account.

This is a really specific case. I would have trouble recommending automatic merging for anything more complicated than this. Especially with the household model in V11 and beyond. There are a lot of considerations with moving transactions and ensuring that the final product of the merge looks correct.

At ATTPAC, we have a Data Integrity Specialist who schedules all merges for the consortium. It is a labor intensive process, but the results are superior.

- Levi
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Nicholas Hudson-Ellis (Past Staff Member) $organization over 12 years ago in reply to Former Member

Thanks all for the great info, very valuable.

Chris, do you use custom SQL code for the no address constituents because the standard Tess Identify Duplicates is not suitable for these "incomplete" records? If you don't mind sharing the SQL code, I would be very interested to see it.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Chris Jensen $organization over 12 years ago in reply to Nicholas Hudson-Ellis (Past Staff Member)

Unknown said:

Chris, do you use custom SQL code for the no address constituents because the standard Tess Identify Duplicates is not suitable for these "incomplete" records? If you don't mind sharing the SQL code, I would be very interested to see it.

Yes, the standard matching code and rules of "fname=x,lname=x,postal_code=x,street1=x" work well for name + postal address matches, but I find that matches based on non-postal criteria are easier to find with different and often simpler code. I'll get in touch offline.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Former Member $organization over 12 years ago

Good day, Nicholas

We’re relatively new to Tessitura (started Aug. 13), and most of our patron accounts migrated from our legacy system.

Data in our legacy system was a mess; thousands of potential duplicates.

When we imported patron accounts from the legacy system into Tessitura, I used SQL script to note these probable duplicates – and simply merged them.

A few merged accounts were wrong, but >99% either were correct or didn’t matter.

As patrons register on our new site (and accounts are created in Tessitura), these accounts pop up as potential duplicates.

At this time I manually run SQL script to identify these potential duplicates, then manually merge them.

Each week I probably merge 20-30 accounts, and will probably continue this manual process for another year or so.

There are too many variables to automate the process, and I still struggle with other users “buying in” to Tessitura usage.

I know Tessitura provides an automatic method to identify and merge potential duplicates, but that process isn’t an option for us right now.

Best Regards

Wendell Baskin

Bass Hall – Fort Worth

From: Tessitura Technical Forum [mailto:forums-technical@tessituranetwork.com] On Behalf Of Nicholas Hudson-Ellis
Sent: Sunday, April 13, 2014 22:58
To: Wendell Baskin
Subject: [Tessitura Technical Forum] Merging Constituents Advice?

Hello all,

I am hoping to get a sense of how different orgs manage their merging of duplicates. I have read on the forums about some great examples from a couple of years ago of people scheduling the Identify Duplicates procedure, and having a daily task of manually putting them into the Merge queue.

Some people appeared to have been confident enough to merge constituents automatically if they meet the criteria.

I would love to hear how many people are automatically merging duplicates, and how many people are doing it manually, which is how I was always taught.

At ACMI, we have thousands of duplicate accounts from importing e-news subscribers over time. The bulk of these accounts only have First Name, Last Name and email. We are working with Tessitura consultants to get our Identify Duplicates logic to pick theses up, and would very much like to merge automatically as many as possible.

Thanks all!

Nicholas

This message was sent automatically to you by www.tessituranetwork.com because you subscribed to the Tessitura Technical Forum. You may reply to this message to post to the Technical forum or visit the site to search, read and post to the forums. In the interest of keeping the forum posts from becoming cluttered, we encourage you to delete previous message text from your reply before sending. Thank you!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Debbie Harland (Past Member) $organization over 12 years ago in reply to Former Member

Hi All,

There are a lot of potential pit falls when merging records as I have recently found out. Our Box Office were being very efficient in their regular merge process, but they didn't know that a number of records were deliberately being created with spoof addresses. They happily merged these thinking they were not required. I then had a visit from one of our Development ladies in complete melt down about missing records!

I've now had several weeks of painful coding to unmerge the incorrectly merged records. They now have the 'Void Merge' attribute!!

So, just a word of warning.....take care!

Debbie
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Christopher Sherwood $organization over 12 years ago
I think that the best way to de-dupe would be to use distance calculations and cataloging functions

While not perfect it's probably going to be as-close-to perfect as you can get without a serious algorithm designed for Tess's database (and since we all use it differently, this may not even be possible?)

If we create table and LTX_CONST_DISTANCE:

Unknown said:

customer_no1 int,

customer_no2 int,

table varchar(128),

columns varchar(128),

distance int,

type int

and table LTR_DISTANCE_TYPE:

Unknown said:

id int,

description varchar(256)

We can then populate it based on tables like T_ACCOUNT_DATA, T_ADDRESS and T_EADDRESS over time using functions such as Jaro-Winkler (probably the single best for data-entry errors) on the most error-prone fields. Note this should be done over time until a database is set, possibly on separate days depending on how long each table/column(s) matching takes. Once completed it won't have to be run unless there is a change to the constituent's data for that table.

Also I would highly suggest using a cataloging function (soundex, metaphone) for names so that names like 'Thompson' and 'Tompson' etc. are matched; this can be useful for cases where a BO associate my put in a name vs a person spelling it themselves. What's great about these cases as well is we usually can tell how a name/field was added in Tessitura via modified/created by values in tables which would be an interesting twist.

In all I would if we were going to automatically merge constituents I would probably require at least:

Use of same/nearly same credit card (via T_ACCOUNT_DATA) (jaro winkler score of > 85 on t_act with matching values on the other fields) I like to think of this as a best-way to matching; limiting conditionals and relying on a verified field it could probably be okay to skip if everything else matched with a JW>90+

Jaro Winkler matched email or postal address (JW >75-85 as base)

Jaro Winkler/Soundex/Metaphone matched first name (JW > 90)

Jaro Winkler/Soundex/Metaphone matched last name (JW > 90)

I'm on a RAMP server right now and don't have access to installing the MDS database so I haven't had a chance to map out the best base Jaro-Winkler values, but the ones would probably be a good starting point IMHO. I have used similar functions in the past to generate numerous matching schemes across databases which would usually have different values, Tess shouldn't be much different.
[edited by: Christopher Sherwood at 11:17 AM (GMT -6) on 13 May 2014]
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Former Member $organization over 12 years ago in reply to Christopher Sherwood

Hi Chris

That's an interesting idea or two.

We've extended the identify-dupes in various ways, including matching across name+address, name+email, and name+phone, and cross-matching as well ( ie if A matches B on address, and B matches C on email, then we say that A matches C as well), and we have been Brave enough to auto-merge across sets ( ie where we identify a Set of dupes, selecting which one to keep and merging all the others into it), based on fairly rigorous conditions. As I may have mentioned elsewhere, our first auto-merge merged 100,000 records, so that's the scale of our problem, which puts it a bit beyond manual solution.

But I don't think i'd be keen about using any approximate-matching methods on email addresses or cc numbers. We only ever use exact-matching on emails - since they're effectively forced to be unique, they're really a good identifier (except where people share them of course - that's a trap.) Obviously approximate-matching would increase the chance of a match in the case of typos, but I would only want to use that for identifying potential dupes for manual checking, not rely on it for auto-merge.

And of course cc numbers are stored encrypted, and since about version 10, they can't be unencrypted by us (thanks to PCI rules), so we can only access the masked version - I don't think I'd want to rely on them at all - even an exact match.

Ken
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Christopher Sherwood $organization over 12 years ago in reply to Former Member

Ken-

I couldn't agree with you more. Personally I think any auto-deduping in Tess is probably a horrible idea since accounts are sometimes just separate to be separate which was a disclaimer I put on a draft of that post (must have lost it in the actual post). it should be something done on a one time basis and it should be thoroughly checked before being deployed or committed in a live database.

One thing I wouldn't agree with is throwing out distance functions for de-duping. I would take a highly approximate match on multiple fields over a single field perfect match since the likelihood they are not the same person goes down over each field matched. If they meet a minimum criteria set then we don't include them.

I do however believe that approximate matching on anything that we know for certain (CC numbers) is just dumb and maybe I wasn't clear when I was explaining I would look at the act_name for a match of some sort (much like a person would do when checking for a dupe) along with the 10 out of 16 numbers of the PAN we know along with expiration date which are all stored in the DB. This was in addition to matching on 3 or 4 other fields with fairly high certainty.

OFC any code could be built on the base I had suggested; even ones that look for perfect matches to even consider an automatic merge.

So I suppose FWIW- I agree, but still would love to see a more semantic approach to the duplication matching features of Tess
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Lydia Gibala $organization over 11 years ago in reply to Former Member

Hi Ken,

I've been reading this thread from last year as we're still working on our duplicate issue at ACMI! I wondered if you could please let me know if and how you set further business rules regarding which record to keep? For example, once duplicates are identified we'd like to give preference to accounts that are attached to a household or have a more recent activity date. Is this something that's managed in SQL code?

Thanks,

Lydia
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Former Member $organization over 11 years ago

Hi Lydia

Yes – definitely in SQL

There are two places where you would want to intervene.

·         There is a stored proc called LP_VALIDATE_MERGE, which is what runs when you manually select two constituents to merge. It runs checks on the proposed merge, and rejects it if it doesn’t pass the rules in the proc. You can customise those rules (you know that you can customise it, because the name starts with L, which is Tess’s naming rule for locally-customisable procedures).

Our version of LP_VALIDATE_MERGE does a lot of checking against our business rules, and either accepts the merge, rejects it completely, or suggests that it should be tried the other way around. Some of our rules block certain high-touch constituenti (eg major donors) from being the delete in a merge (based on constituencies) . The most likely reason for a merge being rejected completely is that both participants are blocked from being the delete. In that case, the consortium admins can bypass the rules and allow the merge to go ahead, after checking with all involved orgs about their preferences

·         Then the other place is the Identify Dupes routine – the one provided with Tess does a fairly basic check based on name and address. We have a heavily modified version which looks at email addresses and phone numbers as well. The identify dupes proc either

o   populates the Merge screen, or

o   feeds into our auto-merge routine, which tries to identify sets of dupes, select one of the set to keep, using the same rules that the Validate script uses., and then schedule the others to merge into it.

The rules are based on a hierarchy of constituencies, plus things like recency of transactions and other activities, and of course membership of households (you can’t merge someone out of a household).

We spent a lot of time on the rules, but really the logic is to try to Keep the record which is likely to have the best-maintained address and contact details – a lot of the other concerns people have about which one to keep are really not very relevant, because almost all of the data is merged across anyway – it’s only a few critical pieces that are important

·         Going one step further – it’s also possible to customise the merge routine itself – we’ve done a lot of that as well, and that also has the potential to make it less relevant which one is kept, because you can optimise the merge in other ways.

One of the traps we discovered with basing the choice on recency of activity is that you tend to get good, well-maintained records merged into records created recently on someone’s website, which tend not to be as well made, so we fiddled with the validation script to de-emphasise recency on that basis.

I’m happy to have a chat about all that at ANZTRUC, if you’d like to go into it in more depth.

Ken McSwain
Systems & Technology Manager

T +61 2 8274 3833 | M +61 418 659 360

Australian Chamber Orchestra

Opera Quays, 2 East Circular Quay, Sydney NSW 2000

PO Box R21, Royal Exchange NSW 1225

aco.com.au



From: Tessitura Technical Forum [mailto:forums-technical@tessituranetwork.com] On Behalf Of Lydia Gibala
Sent: Tuesday, 17 March 2015 10:25
To: Ken McSwain
Subject: Re: [Tessitura Technical Forum] Merging Constituents Advice?

Hi Ken,

I've been reading this thread from last year as we're still working on our duplicate issue at ACMI! I wondered if you could please let me know if and how you set further business rules regarding which record to keep? For example, once duplicates are identified we'd like to give preference to accounts that are attached to a household or have a more recent activity date. Is this something that's managed in SQL code?

Thanks,

Lydia

From: Ken McSwain <bounce-kenmcswain5454@tessituranetwork.com>
Sent: 5/15/2014 12:36:07 AM

Hi Chris

That's an interesting idea or two.

We've extended the identify-dupes in various ways, including matching across name+address, name+email, and name+phone, and cross-matching as well ( ie if A matches B on address, and B matches C on email, then we say that A matches C as well), and we have been Brave enough to auto-merge across sets ( ie where we identify a Set of dupes, selecting which one to keep and merging all the others into it), based on fairly rigorous conditions. As I may have mentioned elsewhere, our first auto-merge merged 100,000 records, so that's the scale of our problem, which puts it a bit beyond manual solution.

But I don't think i'd be keen about using any approximate-matching methods on email addresses or cc numbers. We only ever use exact-matching on emails - since they're effectively forced to be unique, they're really a good identifier (except where people share them of course - that's a trap.) Obviously approximate-matching would increase the chance of a match in the case of typos, but I would only want to use that for identifying potential dupes for manual checking, not rely on it for auto-merge.

And of course cc numbers are stored encrypted, and since about version 10, they can't be unencrypted by us (thanks to PCI rules), so we can only access the masked version - I don't think I'd want to rely on them at all - even an exact match.

Ken

This message was sent automatically to you by www.tessituranetwork.com because you subscribed to the Tessitura Technical Forum. You may reply to this message to post to the Technical forum or visit the site to search, read and post to the forums. In the interest of keeping the forum posts from becoming cluttered, we encourage you to delete previous message text from your reply before sending. Thank you!

ACO's 2015 Single Tickets & Packages Are Now On Sale!

This email is confidential. If you are not the intended recipient you must not disclose or use the information contained in it. If you have received this email in error please notify us immediately by return email and delete the document. The ACO is not responsible for any changes made to a document other than those made by the ACO or for the effect of the changes on the document's meaning. The ACO accepts no liability for any damage caused by this email or its attachments due to viruses interference, interception, corruption or unauthorised access.

Please consider the environment before printing this email.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel