Merging Constituents Advice?

Hello all,

I am hoping to get a sense of how different orgs manage their merging of duplicates. I have read on the forums about some great examples from a couple of years ago of people scheduling the Identify Duplicates procedure, and having a daily task of manually putting them into the Merge queue.

Some people appeared to have been confident enough to merge constituents automatically if they meet the criteria.

I would love to hear how many people are automatically merging duplicates, and how many people are doing it manually, which is how I was always taught.

At ACMI, we have thousands of duplicate accounts from importing e-news subscribers over time. The bulk of these accounts only have First Name, Last Name and email. We are working with Tessitura consultants to get our Identify Duplicates logic to pick theses up, and would very much like to merge automatically as many as possible.

Thanks all!

Nicholas

Parents
  • Former Member
    Former Member $organization

    Hi Lydia

    Yes – definitely in SQL

    There are two places where you would want to intervene.

    ·         There is a stored proc called LP_VALIDATE_MERGE, which is what runs when you manually select two constituents to merge. It runs checks on the proposed merge, and rejects it if it doesn’t pass the rules in the proc. You can customise those rules (you know that you can customise it, because the name starts with L, which is Tess’s naming rule for locally-customisable procedures).

    Our version of LP_VALIDATE_MERGE does  a lot of checking against our business rules, and either accepts the merge, rejects it completely, or suggests that it should be tried the other way around. Some of our rules block certain high-touch constituenti (eg major donors)  from being the delete in a merge (based on constituencies) . The most likely reason for a merge being rejected  completely is that both participants are blocked from being the delete.  In that case, the consortium admins can bypass the rules and allow the merge to go ahead, after checking with all involved orgs about their preferences

     

    ·         Then the other place is the Identify Dupes routine – the one provided with Tess does a fairly basic check based on name and address. We have a heavily modified version which looks at email addresses and phone numbers as well. The identify dupes proc either

    o   populates the Merge screen, or

    o    feeds into our auto-merge routine, which tries to identify sets of dupes, select one of the set to keep, using the same rules that the Validate script uses., and then schedule the others to merge into it.

     

    The rules are based on a hierarchy of constituencies, plus things like recency of transactions and other activities, and of course membership of households (you can’t merge someone out of a household).

    We spent a lot of time on the rules, but really the logic is to try to Keep the record which is likely to have the best-maintained address and contact details – a lot of the other concerns people have about which one to keep are really not very relevant, because almost all of the data is merged across anyway – it’s only a few critical pieces that  are important

     

    ·         Going one step further – it’s also possible to customise the merge routine itself – we’ve done a lot of that as well, and that also has the potential to make it less relevant which one is kept, because you can optimise the merge in other ways.

     

    One of the traps we discovered with basing the choice on recency of activity is that you tend to get good, well-maintained records merged into records created recently on someone’s website, which tend not to be as well made, so we fiddled with the validation script to de-emphasise recency on that basis.

     

    I’m happy to have a chat about all that at ANZTRUC, if you’d like to go into it in more depth.

     

    Ken McSwain
    Systems & Technology Manager

    T +61 2 8274 3833 | M +61 418 659 360

     

    Australian Chamber Orchestra

    Opera Quays, 2 East Circular Quay, Sydney NSW 2000

    PO Box R21, Royal Exchange NSW 1225

    aco.com.au

     

    cid:image001.jpg@01D01957.502E38C0cid:image002.png@01D01957.502E38C0cid:image003.png@01D01957.502E38C0 cid:image004.png@01D01957.502E38C0 cid:image005.png@01D01957.502E38C0 

     

    From: Tessitura Technical Forum [mailto:forums-technical@tessituranetwork.com] On Behalf Of Lydia Gibala
    Sent: Tuesday, 17 March 2015 10:25
    To: Ken McSwain
    Subject: Re: [Tessitura Technical Forum] Merging Constituents Advice?

     

    Hi Ken, 

    I've been reading this thread from last year as we're still working on our duplicate issue at ACMI! I wondered if you could please let me know if and how you set further business rules regarding which record to keep? For example, once duplicates are identified we'd like to give preference to accounts that are attached to a household or have a more recent activity date. Is this something that's managed in SQL code?

    Thanks,

    Lydia 

    From: Ken McSwain <bounce-kenmcswain5454@tessituranetwork.com>
    Sent: 5/15/2014 12:36:07 AM

    Hi Chris

     That's an interesting idea or two.

    We've extended the identify-dupes in various ways, including matching across name+address, name+email, and name+phone, and cross-matching as well ( ie if A matches B on address, and B  matches C on email, then we say that A matches C as well), and we have been Brave enough to auto-merge across sets ( ie where we identify a Set of dupes, selecting which one to keep and merging all the others into it), based on fairly rigorous conditions. As I may have mentioned elsewhere, our first auto-merge merged 100,000 records, so that's the scale of our problem, which puts it a bit beyond manual solution. 

    But I don't think i'd be keen about using any approximate-matching methods on email addresses or cc numbers. We only ever use exact-matching on emails - since they're effectively forced to be unique, they're really a good identifier (except where people share them of course - that's a trap.) Obviously approximate-matching would increase the chance of a match in the case of  typos, but I would only want to use that for identifying potential dupes for manual checking, not rely on it for auto-merge.

    And of course cc numbers are stored encrypted, and since about version 10, they can't be unencrypted by us (thanks to PCI rules), so we can only access the masked version - I don't think I'd want to rely on them at all - even an exact match.

    Ken

     

     




    This message was sent automatically to you by www.tessituranetwork.com because you subscribed to the Tessitura Technical Forum. You may reply to this message to post to the Technical forum or visit the site to search, read and post to the forums. In the interest of keeping the forum posts from becoming cluttered, we encourage you to delete previous message text from your reply before sending. Thank you!


    ACO's 2015 Single Tickets & Packages Are Now On Sale!

    This email is confidential. If you are not the intended recipient you must not disclose or use the information contained in it. If you have received this email in error please notify us immediately by return email and delete the document. The ACO is not responsible for any changes made to a document other than those made by the ACO or for the effect of the changes on the document's meaning. The ACO accepts no liability for any damage caused by this email or its attachments due to viruses interference, interception, corruption or unauthorised access.

    Please consider the environment before printing this email.
Reply
  • Former Member
    Former Member $organization

    Hi Lydia

    Yes – definitely in SQL

    There are two places where you would want to intervene.

    ·         There is a stored proc called LP_VALIDATE_MERGE, which is what runs when you manually select two constituents to merge. It runs checks on the proposed merge, and rejects it if it doesn’t pass the rules in the proc. You can customise those rules (you know that you can customise it, because the name starts with L, which is Tess’s naming rule for locally-customisable procedures).

    Our version of LP_VALIDATE_MERGE does  a lot of checking against our business rules, and either accepts the merge, rejects it completely, or suggests that it should be tried the other way around. Some of our rules block certain high-touch constituenti (eg major donors)  from being the delete in a merge (based on constituencies) . The most likely reason for a merge being rejected  completely is that both participants are blocked from being the delete.  In that case, the consortium admins can bypass the rules and allow the merge to go ahead, after checking with all involved orgs about their preferences

     

    ·         Then the other place is the Identify Dupes routine – the one provided with Tess does a fairly basic check based on name and address. We have a heavily modified version which looks at email addresses and phone numbers as well. The identify dupes proc either

    o   populates the Merge screen, or

    o    feeds into our auto-merge routine, which tries to identify sets of dupes, select one of the set to keep, using the same rules that the Validate script uses., and then schedule the others to merge into it.

     

    The rules are based on a hierarchy of constituencies, plus things like recency of transactions and other activities, and of course membership of households (you can’t merge someone out of a household).

    We spent a lot of time on the rules, but really the logic is to try to Keep the record which is likely to have the best-maintained address and contact details – a lot of the other concerns people have about which one to keep are really not very relevant, because almost all of the data is merged across anyway – it’s only a few critical pieces that  are important

     

    ·         Going one step further – it’s also possible to customise the merge routine itself – we’ve done a lot of that as well, and that also has the potential to make it less relevant which one is kept, because you can optimise the merge in other ways.

     

    One of the traps we discovered with basing the choice on recency of activity is that you tend to get good, well-maintained records merged into records created recently on someone’s website, which tend not to be as well made, so we fiddled with the validation script to de-emphasise recency on that basis.

     

    I’m happy to have a chat about all that at ANZTRUC, if you’d like to go into it in more depth.

     

    Ken McSwain
    Systems & Technology Manager

    T +61 2 8274 3833 | M +61 418 659 360

     

    Australian Chamber Orchestra

    Opera Quays, 2 East Circular Quay, Sydney NSW 2000

    PO Box R21, Royal Exchange NSW 1225

    aco.com.au

     

    cid:image001.jpg@01D01957.502E38C0cid:image002.png@01D01957.502E38C0cid:image003.png@01D01957.502E38C0 cid:image004.png@01D01957.502E38C0 cid:image005.png@01D01957.502E38C0 

     

    From: Tessitura Technical Forum [mailto:forums-technical@tessituranetwork.com] On Behalf Of Lydia Gibala
    Sent: Tuesday, 17 March 2015 10:25
    To: Ken McSwain
    Subject: Re: [Tessitura Technical Forum] Merging Constituents Advice?

     

    Hi Ken, 

    I've been reading this thread from last year as we're still working on our duplicate issue at ACMI! I wondered if you could please let me know if and how you set further business rules regarding which record to keep? For example, once duplicates are identified we'd like to give preference to accounts that are attached to a household or have a more recent activity date. Is this something that's managed in SQL code?

    Thanks,

    Lydia 

    From: Ken McSwain <bounce-kenmcswain5454@tessituranetwork.com>
    Sent: 5/15/2014 12:36:07 AM

    Hi Chris

     That's an interesting idea or two.

    We've extended the identify-dupes in various ways, including matching across name+address, name+email, and name+phone, and cross-matching as well ( ie if A matches B on address, and B  matches C on email, then we say that A matches C as well), and we have been Brave enough to auto-merge across sets ( ie where we identify a Set of dupes, selecting which one to keep and merging all the others into it), based on fairly rigorous conditions. As I may have mentioned elsewhere, our first auto-merge merged 100,000 records, so that's the scale of our problem, which puts it a bit beyond manual solution. 

    But I don't think i'd be keen about using any approximate-matching methods on email addresses or cc numbers. We only ever use exact-matching on emails - since they're effectively forced to be unique, they're really a good identifier (except where people share them of course - that's a trap.) Obviously approximate-matching would increase the chance of a match in the case of  typos, but I would only want to use that for identifying potential dupes for manual checking, not rely on it for auto-merge.

    And of course cc numbers are stored encrypted, and since about version 10, they can't be unencrypted by us (thanks to PCI rules), so we can only access the masked version - I don't think I'd want to rely on them at all - even an exact match.

    Ken

     

     




    This message was sent automatically to you by www.tessituranetwork.com because you subscribed to the Tessitura Technical Forum. You may reply to this message to post to the Technical forum or visit the site to search, read and post to the forums. In the interest of keeping the forum posts from becoming cluttered, we encourage you to delete previous message text from your reply before sending. Thank you!


    ACO's 2015 Single Tickets & Packages Are Now On Sale!

    This email is confidential. If you are not the intended recipient you must not disclose or use the information contained in it. If you have received this email in error please notify us immediately by return email and delete the document. The ACO is not responsible for any changes made to a document other than those made by the ACO or for the effect of the changes on the document's meaning. The ACO accepts no liability for any damage caused by this email or its attachments due to viruses interference, interception, corruption or unauthorised access.

    Please consider the environment before printing this email.
Children
No Data