Best Practices for New Record Data Clean Up and Merges

Hello Everyone,

At the start of the pandemic nearly all of our staff was laid off and as a result many non-critical tasks like reviewing new records and merging constituents fell by the way side. So... two years later we have around 60,000-120,000  new records and over 3,000 merges to complete. In the past we delegated data clean up and merging to the box office, but due to continuing staffing shortages we'll need to find another way.

Does anyone have experience using automated procedures to clean up data and schedule merges. Is this possible? We'd like to add CSIs to new records with invalid or missing addresses. I vaguely remember reading a forum post a while back about some tessitura organizations who outsourced data clean up and merges. I can't seem to find that post again. Has anybody tried this? Anyways, I'd love to hear what you all do and discuss the best ways to insure clean consistent data.

Thank you,

Joseph

Parents
  •  

     Director of Database Administration and Support Services at the Philadelphia Regional Arts Consortium is a master of the consortium account merge and purge process across organizations.  Several years back now I heard her speak about their process.  

  • Thanks for the mention, !

    , et al, I have done a lot of work over the years on getting our consortium's constituent merge process streamlined as much as possible, and I honestly love to talk about it. :)

    The discussion over automated vs manual duplicate identification and merging is an interesting one. I heard the phrase "curated merging" at a TLCC once and that seems like the best way to describe my consortium's approach. We do automate a fair deal of duplicate identification and merge scheduling, and we also encourage members at all of our 20+ consortium organizations to take part in the fun of scheduling merges. (They know their patrons the best after all!) However, even with the automated dup identification and scheduling of merges, there are a whole slew of issues that might cause a merge either to fail or lose data, and I'm talking way beyond questions like which primary address is the preferred one. I did at one point share my customizations with the Network, but I think it's been so long they were taken down from the shared reports site (which is probably fine, as I'm sure they needed some updates).

    I have presented in past TLCCs on duplicate as well as other types of data management, and here's a PowerPoint from a presentation I gave at our Philly regional community group in 2019. I'd love to reshare my code as well, but it's always one of those back-burner projects and I wonder as well if it's as useful to orgs that don't have a DBA/SQL programmer on hand.

    If we didn't have someone with SQL skills on hand I think I'd look into having a consultant write something to at least automate the merge scheduling process. Then, if you have the merge job run once a week you could review the merges the day before the job run and unschedule any obvious false positives. Bad merges happen, if with the most careful setup, but if you feel confident enough in the merge scheduling process they should be few and far between. (And while reversing a merge isn't straightforward, it's not impossible, as others have said, but it does require SQL knowledge.)

    Happy duplicates handling!

Reply
  • Thanks for the mention, !

    , et al, I have done a lot of work over the years on getting our consortium's constituent merge process streamlined as much as possible, and I honestly love to talk about it. :)

    The discussion over automated vs manual duplicate identification and merging is an interesting one. I heard the phrase "curated merging" at a TLCC once and that seems like the best way to describe my consortium's approach. We do automate a fair deal of duplicate identification and merge scheduling, and we also encourage members at all of our 20+ consortium organizations to take part in the fun of scheduling merges. (They know their patrons the best after all!) However, even with the automated dup identification and scheduling of merges, there are a whole slew of issues that might cause a merge either to fail or lose data, and I'm talking way beyond questions like which primary address is the preferred one. I did at one point share my customizations with the Network, but I think it's been so long they were taken down from the shared reports site (which is probably fine, as I'm sure they needed some updates).

    I have presented in past TLCCs on duplicate as well as other types of data management, and here's a PowerPoint from a presentation I gave at our Philly regional community group in 2019. I'd love to reshare my code as well, but it's always one of those back-burner projects and I wonder as well if it's as useful to orgs that don't have a DBA/SQL programmer on hand.

    If we didn't have someone with SQL skills on hand I think I'd look into having a consultant write something to at least automate the merge scheduling process. Then, if you have the merge job run once a week you could review the merges the day before the job run and unschedule any obvious false positives. Bad merges happen, if with the most careful setup, but if you feel confident enough in the merge scheduling process they should be few and far between. (And while reversing a merge isn't straightforward, it's not impossible, as others have said, but it does require SQL knowledge.)

    Happy duplicates handling!

Children
No Data