Exclude Household Members From Identify Duplicates

Joshua Peters $organization over 5 years ago

Greetings everyone,

I am working on a major database clean-up and have hit an issue that has caused some frustration. I am including email addresses as a criteria, and I've noticed that members of a household often use the same email. I obviously don't want to merge Jane Smith and John Smith of the Smith household together even though they have the same email. Has anyone tried to edit the identify merge procedure to exclude constituents who are members of the same household? I keep experimenting with bits of code but I'm worried about excluding more than I want to. I would love to hear any suggestions if someone has dealt with this already.

Thanks!

Gawain Lavers $organization over 5 years ago

Are these not A1/A2? I thought there was a check preventing merging of an A1 with an A2. Or maybe it's that: merging is prevented but adding to the list isn't?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Joshua Peters $organization over 5 years ago in reply to Gawain Lavers

The latter may be the case. From what I can tell, there is nothing in the Identify procedure that prevents them from showing up. I haven't actually tried to merge them, so they may be protected from actually being merged, but I really want them cleaned out of my potential duplicates list to make it easier to manage.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Joshua Peters $organization over 5 years ago

Here is the code that I have written to add to the procedure, though my tests have been unsuccessful. I am no SQL wizard so I am sure this is the clunkiest way to do this imaginable.

--List constituents and their primary households
CREATE TABLE #household_members (constituent int, household int)

INSERT INTO #household_members
SELECT individual_customer_no, group_customer_no
FROM T_AFFILIATION
WHERE inactive = 'N'
AND primary_ind = 'Y'
AND affiliation_type_id = 10002

--List potential duplicates including their household
CREATE TABLE #shared_household (customer_1 int, household_1 int, customer_2 int, household_2 int)

INSERT INTO #shared_household
SELECT a.customer_no1,
(SELECT household FROM #household_members WHERE constituent = a.customer_no1),
a.customer_no2,
(SELECT household FROM #household_members WHERE constituent = a.customer_no2)
FROM #potential_duplicates a
WHERE a.identify_method != @identify_method_manually_scheduled

--Create table of criterion to remove from potential duplicates where matching constituents are part of the same household.
CREATE TABLE #delete_criterion2 (criterion varchar)

INSERT INTO #delete_criterion2
SELECT a.criterion
FROM #potential_duplicates a
JOIN #shared_household b ON a.customer_no1 = b.customer_1 AND a.customer_no2 = b.customer_2
WHERE b.household_1 IS NOT NULL
AND b.household_2 IS NOT NULL
AND b.household_1 = b.household_2

DELETE FROM #potential_duplicates
WHERE criterion in (Select criterion From #delete_criterion2)

DROP TABLE #household_members
DROP TABLE #shared_household
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Gawain Lavers $organization over 5 years ago in reply to Joshua Peters

I don't know if you want to add to the procedure: at the very least your work will be blown away if that code is ever in an update. What I have done, for different reasons, is created a procedure which I run immediately after the identify duplicates procedure to trim out lines I'm not interested in. Let me think about your query.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Gawain Lavers $organization over 5 years ago in reply to Joshua Peters

select
pd1.criterion,
pd1.customer_no,
pd1.keep_cust,
pd1.identify_method,
pd2.customer_no as customer_no2,
pd2.keep_cust as keep_cust2,
pd2.identify_method as identify_method2
from T_POTENTIAL_DUPS as pd1
inner join T_POTENTIAL_DUPS as pd2 on pd2.criterion = pd1.criterion
and pd2.customer_no <> pd1.customer_no
inner join T_AFFILIATION as aff1 on aff1.individual_customer_no = pd1.customer_no
and aff1.inactive <> 'Y' and aff1.primary_ind = 'Y' and aff1.affiliation_type_id = 10002
inner join T_AFFILIATION as aff2 on aff2.individual_customer_no = pd2.customer_no
and aff2.inactive <> 'Y' and aff2.primary_ind = 'Y' and aff2.affiliation_type_id = 10002
and aff2.group_customer_no = aff1.group_customer_no
order by
pd1.criterion,
pd1.customer_no,
pd2.customer_no

I'd check this out thoroughly, but you should then be able to join this in to a delete on the criterion to eliminate any such record from T_POTENTIAL_DUPS.

Now, that said, I was surprised to get so many returns in our own database, and what I found were cases where the first name wasn't specified for either member, or where it appeared that the same constituent had been specified twice on a household.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Joshua Peters $organization over 5 years ago in reply to Gawain Lavers

Thanks so much Gawain! This seems to be working in identifying the constituents I was looking for. And thank you also for your suggestion about this being a separate procedure that I run after the initial pull. I think that will definitely be the way to go.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Gawain Lavers $organization over 5 years ago in reply to Joshua Peters

You might want to add a check to make sure that the fname is not in fact identical before removing them from the list.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Joshua Peters $organization over 5 years ago in reply to Joshua Peters

Good point...I'm on it! Thanks again!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel