Exclude Household Members From Identify Duplicates

Greetings everyone,

I am working on a major database clean-up and have hit an issue that has caused some frustration. I am including email addresses as a criteria, and I've noticed that members of a household often use the same email. I obviously don't want to merge Jane Smith and John Smith of the Smith household together even though they have the same email. Has anyone tried to edit the identify merge procedure to exclude constituents who are members of the same household? I keep experimenting with bits of code but I'm worried about excluding more than I want to. I would love to hear any suggestions if someone has dealt with this already.

Thanks!

  • Are these not A1/A2?  I thought there was a check preventing merging of an A1 with an A2.  Or maybe it's that: merging is prevented but adding to the list isn't?

  • The latter may be the case. From what I can tell, there is nothing in the Identify procedure that prevents them from showing up. I haven't actually tried to merge them, so they may be protected from actually being merged, but I really want them cleaned out of my potential duplicates list to make it easier to manage.

  • Here is the code that I have written to add to the procedure, though my tests have been unsuccessful. I am no SQL wizard so I am sure this is the clunkiest way to do this imaginable.

    --List constituents and their primary households
    CREATE TABLE #household_members (constituent int, household int)

    INSERT INTO #household_members
    SELECT individual_customer_no, group_customer_no
    FROM T_AFFILIATION
    WHERE inactive = 'N'
    AND primary_ind = 'Y'
    AND affiliation_type_id = 10002

    --List potential duplicates including their household
    CREATE TABLE #shared_household (customer_1 int, household_1 int, customer_2 int, household_2 int)

    INSERT INTO #shared_household
    SELECT a.customer_no1,
    (SELECT household FROM #household_members WHERE constituent = a.customer_no1),
    a.customer_no2,
    (SELECT household FROM #household_members WHERE constituent = a.customer_no2)
    FROM #potential_duplicates a
    WHERE a.identify_method != @identify_method_manually_scheduled

    --Create table of criterion to remove from potential duplicates where matching constituents are part of the same household.
    CREATE TABLE #delete_criterion2 (criterion varchar)

    INSERT INTO #delete_criterion2
    SELECT a.criterion
    FROM #potential_duplicates a
    JOIN #shared_household b ON a.customer_no1 = b.customer_1 AND a.customer_no2 = b.customer_2
    WHERE b.household_1 IS NOT NULL
    AND b.household_2 IS NOT NULL
    AND b.household_1 = b.household_2

    DELETE FROM #potential_duplicates
    WHERE criterion in (Select criterion From #delete_criterion2)

    DROP TABLE #household_members
    DROP TABLE #shared_household

  • I don't know if you want to add to the procedure: at the very least your work will be blown away if that code is ever in an update.  What I have done, for different reasons, is created a procedure which I run immediately after the identify duplicates procedure to trim out lines I'm not interested in.  Let me think about your query.

  • select
    pd1.criterion,
    pd1.customer_no,
    pd1.keep_cust,
    pd1.identify_method,
    pd2.customer_no as customer_no2,
    pd2.keep_cust as keep_cust2,
    pd2.identify_method as identify_method2
    from T_POTENTIAL_DUPS as pd1
    inner join T_POTENTIAL_DUPS as pd2 on pd2.criterion = pd1.criterion
    and pd2.customer_no <> pd1.customer_no
    inner join T_AFFILIATION as aff1 on aff1.individual_customer_no = pd1.customer_no
    and aff1.inactive <> 'Y' and aff1.primary_ind = 'Y' and aff1.affiliation_type_id = 10002
    inner join T_AFFILIATION as aff2 on aff2.individual_customer_no = pd2.customer_no
    and aff2.inactive <> 'Y' and aff2.primary_ind = 'Y' and aff2.affiliation_type_id = 10002
    and aff2.group_customer_no = aff1.group_customer_no
    order by
    pd1.criterion,
    pd1.customer_no,
    pd2.customer_no

    I'd check this out thoroughly, but you should then be able to join this in to a delete on the criterion to eliminate any such record from T_POTENTIAL_DUPS.

    Now, that said, I was surprised to get so many returns in our own database, and what I found were cases where the first name wasn't specified for either member, or where it appeared that the same constituent had been specified twice on a household.

  • Thanks so much Gawain! This seems to be working in identifying the constituents I was looking for. And thank you also for your suggestion about this being a separate procedure that I run after the initial pull. I think that will definitely be the way to go.

  • You might want to add a check to make sure that the fname is not in fact identical before removing them from the list.

  • Good point...I'm on it! Thanks again!