Hi everyone,
We're looking at doing some broad clean-up of all of our bad address records - getting all the bad zip codes to be the same, with the same city and state, among other things. Anyone else done this and have any pieces of advice, technically speaking or otherwise? I'm nervous to operate on this many records in our live environment but would prefer specific fear to general!
Thanks!
Frannie
We've done a few cleanups like this in the past (and ongoing).
Some things I chose to do via SQL in bulk but those were things that I wasn't scared about making a mistake on. For example, common mistypings of Edmonton (ex. Edmotnon) I isolated and updated. Or things that we abbreviate (Street = St, Avenue = Ave, etc) I would isolate and update.
But if I couldn't build a hard and fast rule about it then it became a cleanup project for our box office staff during down times. We created several dynamic lists that would find potential issues and then the results would be analyzed by the staff and updated when appropriate. These lists are now run periodically in an attempt to keep on top of these items. These lists find things from unabbreviated words, to incorrectly abreviated words, to extranneous punctuation or spaces, and more!
My biggest advice is test, test, test!! Sometimes you think you've got a rule that will only pull bad addresses but the database surprises you with items that suddenly break your logic. :)
Hi Frannie,
We do a lot of that at PRAC, but I agree with Beth that testing is essential. We also work with our users to make sure that they're following our consortium's data standards to minimize the new bad addresses we see. For example, we ask our organizations to only use their mailing address when it is valid for a patron and instead use a dummy address (street1=Unknown, zip=99999) for patrons when they don't give an address. (Our consortium requires addresses, but that's another discussion.)
With that in mind, I've taken a slightly different approach to "correcting" bad addresses, which is to have a view that lists common bad address strings. Then, rather than changing the address itself, I use that view to apply an "Invalid Address" address type. That way we keep any address information that might have helped identify the patron.
If you're looking for ways to keep an eye on this issue going forward as well, you might want to check out the shared reports section in TASK, filtering by "address". There are a number of address-related utilities and reports, including one that I shared a few years ago that points out common US address issues.
Good luck!
~Katie
A familiar topic; address cleanup is at the heart of our regular data hygiene processes, both via SA job and manual checking.
> I'm nervous to operate on this many records in our live environment
I'd recommend making your matching code very strict, and your updates exceedingly precise, to begin with, so that it affects as few records as possible and you can easily review them all before and after. As your confidence grows, so may your updates...
This is all good advice so far. I've had to do a significant amount of address cleanup lately, so let me throw out a few things that come to mind:
Major comprehensive scripted updates to the database feel terrifying because they are, in fact, potentially disastrous. Losing sleep is a fundamental part of the equation.
With that in mind I have tried to avoid ever deleting addresses if I can help it: I prefer to inactivate them, so that the information isn't lost. But here are two things I do to try and make it easier to see what I have changed if things do not go as planned. First, I have created a special Contact Point Preference Type for inactivated addresses, where the Contact Point Preference is descriptive of why the address was inactivated (mostly I use this for NCOA Moves and Deletes). I've been warned that this is not what Contact Point Preferences are for, but it is an excellent way to flag an address both for future queries but also to be able to see quickly and clearly when perusing the customer record in the client.
Second, you can take potentially advantage of the NCOA tables/columns to establish a "batch", if you were, of address updates. You can define an "NCOA Session" and update all affected address to refer to that session. Might confuse your NCOA reporting, but who ever looks at that?
Speaking of batch updates, one issue you can run into when trying to analyze whether or not an address (or other contact point) is out of date, or whether one should be preferred over another is that a number of Tessitura Upgrades have wound up touching all addresses of a certain type, and so you may find that certain classes of addresses all have a last update date >= say your v11 upgrade date.
For any batch operation, and this applies to regular NCOA stuff, work with your marketing and development departments to carefully define important customers who should never have their records automatically updated, and be sure to screen them out of your scripts.
This is a corrollary, but you will of course want those customer accounts to be reviewed, since likely you do want them changed, just manually, but also, after you've just done a large data-sanatizing operation, you'll want to keep the data clean, so what I've done is create a set of lists that capture the addresses that appear to need cleanup (and I do this for a variety of data integrity issues beyond addresses), and put them in a Category such as "Data Integrity", and make them all dynamic, or as I have done, set up a scheduled job to regenerate them every night. Then turn them over to various customer record stakeholders and let them sort out who needs to keep on top of what. Optimally then all those lists will stay at or near zero customers.
Also, if you are changing the format of addresses you may run into an issue where developement objects to the post office standard. They may not want:
Mr. Jonathan Smythe Esq.123 Shady Lane CourtFancypants SuburbCA 90000
replaced with
JOHN SMITH 123 SHADY LN CT OFF GLOTTS RD OAKLAND CA 90000-0000
One of the hardest parts of doing bulk correction of postal addresses for me was the relationship of addresses and affiliates. For instance, I realized after massive inactivation that I had left tens of thousands of accounts without an active, primary address, and that a number of things (specifically in TNEW) break down if that happens. So I had to find those accounts and add dummy addresses, or select an appropriate address to make primary if there as an active but not primary address. But then I also had to assess if the customer was an affiliate on a household which had an active, primary address (in which case there was no problem), and if the household didn't, but another affiliate did, whether I wanted to promote that address to the household...etc.
This is a really good question and something we are talking about here as well.
I think it’s a good idea to do a refresh at least every five years or so. One thing you’ll find is that if you have a ‘bad address’ flag or ‘returned mail’ flag that they get added to constituents but never removed if the address is later corrected. It’s usually a training issue for box office to know if you clean up an address in any way to also remove bad address flags so you can again market to those people. So a bulk refresh using an NCOA or QAS service can easily identify the current bad addresses and you can remove the flag from the rest.
The other issue I’ve run into at every organization I’ve worked at is that it’s second nature for box office staff to use your business address for all dummy data / administrative accounts. This should be avoided because it will heavily skew any geographic analysis. Here’s a specific forum thread from when we did this clean up while I was working with 92Y. Best practice is to use something clearly fake such as street = unknown street city = unknown state = XX and zip code 99999 or 00000. You should determine a standard dummy record for your organization that everyone follows so you can easily suppress these in lists and identify them in reporting.
http://www.tessituranetwork.com/Community/forums/t/6857.aspx
Country analysis is also crucial and I find that in the past box office reps have used every address field imaginable to capture country so before you do any clean-up you might want to run some queries on addresses and locate any that have say “Brazil” in the street or city line and move that to the actual country field. This is the only data that comes to mind that could be lost if you standardize the bad addresses.
Additional constituent clean-up you could do would be fixing salutations if your business rules weren’t followed which is usually the case with the manually box office created ones. You can also run a script to add gender based upon assumptions gleaned from the prefix i.e. Mr = male. Many places have this as a standard overnight procedure. Other clean-up I’ve done is the past is fixing names that are in all caps or all lower case letters, you can write a script for this too to run ongoing using the most obvious assumptions.
Feel free to reach out if you have questions about any of the above.
All the best,
Chuck
From: Tessitura Technical Forum [mailto:forums-technical@tessituranetwork.com] On Behalf Of Frances O'Connell Sent: Monday, April 24, 2017 5:32 PM To: Buchanan, Charles Subject: [Tessitura Technical Forum] mass address clean-up tips?
This message was sent automatically to you by www.tessituranetwork.com because you subscribed to the Tessitura Technical Forum. You may reply to this message to post to the Technical forum or visit the site to search, read and post to the forums. In the interest of keeping the forum posts from becoming cluttered, we encourage you to delete previous message text from your reply before sending. Thank you!