Greetings everyone,
I am working on a major database clean-up and have hit an issue that has caused some frustration. I am including email addresses as a criteria, and I've noticed that members of a household often use the same email. I obviously don't want to merge Jane Smith and John Smith of the Smith household together even though they have the same email. Has anyone tried to edit the identify merge procedure to exclude constituents who are members of the same household? I keep experimenting with bits of code but I'm worried about excluding more than I want to. I would love to hear any suggestions if someone has dealt with this already.
Thanks!
Here is the code that I have written to add to the procedure, though my tests have been unsuccessful. I am no SQL wizard so I am sure this is the clunkiest way to do this imaginable.
--List constituents and their primary householdsCREATE TABLE #household_members (constituent int, household int)
INSERT INTO #household_members SELECT individual_customer_no, group_customer_no FROM T_AFFILIATION WHERE inactive = 'N' AND primary_ind = 'Y' AND affiliation_type_id = 10002
--List potential duplicates including their householdCREATE TABLE #shared_household (customer_1 int, household_1 int, customer_2 int, household_2 int)
INSERT INTO #shared_household SELECT a.customer_no1, (SELECT household FROM #household_members WHERE constituent = a.customer_no1), a.customer_no2, (SELECT household FROM #household_members WHERE constituent = a.customer_no2) FROM #potential_duplicates a WHERE a.identify_method != @identify_method_manually_scheduled
--Create table of criterion to remove from potential duplicates where matching constituents are part of the same household.CREATE TABLE #delete_criterion2 (criterion varchar)
INSERT INTO #delete_criterion2SELECT a.criterionFROM #potential_duplicates a JOIN #shared_household b ON a.customer_no1 = b.customer_1 AND a.customer_no2 = b.customer_2WHERE b.household_1 IS NOT NULL AND b.household_2 IS NOT NULL AND b.household_1 = b.household_2
DELETE FROM #potential_duplicatesWHERE criterion in (Select criterion From #delete_criterion2)
DROP TABLE #household_membersDROP TABLE #shared_household
I don't know if you want to add to the procedure: at the very least your work will be blown away if that code is ever in an update. What I have done, for different reasons, is created a procedure which I run immediately after the identify duplicates procedure to trim out lines I'm not interested in. Let me think about your query.
select pd1.criterion, pd1.customer_no, pd1.keep_cust, pd1.identify_method, pd2.customer_no as customer_no2, pd2.keep_cust as keep_cust2, pd2.identify_method as identify_method2from T_POTENTIAL_DUPS as pd1 inner join T_POTENTIAL_DUPS as pd2 on pd2.criterion = pd1.criterion and pd2.customer_no <> pd1.customer_no inner join T_AFFILIATION as aff1 on aff1.individual_customer_no = pd1.customer_no and aff1.inactive <> 'Y' and aff1.primary_ind = 'Y' and aff1.affiliation_type_id = 10002 inner join T_AFFILIATION as aff2 on aff2.individual_customer_no = pd2.customer_no and aff2.inactive <> 'Y' and aff2.primary_ind = 'Y' and aff2.affiliation_type_id = 10002 and aff2.group_customer_no = aff1.group_customer_noorder by pd1.criterion, pd1.customer_no, pd2.customer_no
I'd check this out thoroughly, but you should then be able to join this in to a delete on the criterion to eliminate any such record from T_POTENTIAL_DUPS.
Now, that said, I was surprised to get so many returns in our own database, and what I found were cases where the first name wasn't specified for either member, or where it appeared that the same constituent had been specified twice on a household.