Duplicate Email Constituent Record Report?

I am doing a huge database cleanup. In our old system, it was ok for 10 constituent records to all have the same email address attached to them...in Tessitura, it is not. We found that out the hard way, so now I am trying to manually go through and take all the emails off of the constituent records that they don't belong on. 

Our Ongoing Services team sent me a list they pulled of all the constituents with the same emails that kept their logins, which I have already gone through. They also pulled us a list of every account they pulled the temp logins off of, but did not specify the duplicate emails. So I have a list of about 90,000 email addresses I am tediously going through to see if each email has multiple records attached to it. Since many do not, it's wasting my time to check those ones. 

I have a ticket open about this but haven't heard back in a few days; I'm wondering if there is a list or an extraction or something I can pull myself to get the data I am looking for. 

What I am looking for is a list of all of our constituent records that no longer have logins attached, but do have email addresses and that email address is linked to multiple constituent records. Is it doable myself or do I have to wait and have Ongoing Services do it? 

Parents
  • Okay, so I came up with a manual List query to at least organize the work a bit.  If you create a list with one trivial criteria, save, and then go to "show query" you can delete all of the code in the box and replace it with this:

    select
    	distinct de.customer_no
    from T_CUSTOMER as c
    	inner join (
    	select
    		e.customer_no,
    		case when e.login_no = -1 and d.login_no = -1 then 'N' else 'Y' end as connected_login,
    		case when h.customer_no is not null then 'Y' else 'N' end as hh_dup,
    		cust_count.customers
    	from (
    		select e1.*,
    			ISNULL(cl.login_no, -1) as login_no
    		from T_EADDRESS as e1
    			inner join TR_EADDRESS_TYPE as et on et.id = e1.eaddress_type
    				and et.email_ind = 'Y'
    			left outer join T_CUST_LOGIN as cl on cl.eaddress_no = e1.eaddress_no
    		) as e
    		inner join (
    			select e1.*,
    				ISNULL(cl.login_no, -1) as login_no
    			from T_EADDRESS as e1
    				inner join TR_EADDRESS_TYPE as et on et.id = e1.eaddress_type
    					and et.email_ind = 'Y'
    				left outer join T_CUST_LOGIN as cl on cl.eaddress_no = e1.eaddress_no
    		) as d on d.address = e.address and d.eaddress_no <> e.eaddress_no
    		left outer join V_CUSTOMER_WITH_HOUSEHOLD as h on h.customer_no = e.customer_no
    			and h.expanded_customer_no = d.customer_no
    			and h.customer_no <> h.expanded_customer_no
    		left outer join (
    			select e1.address, COUNT(distinct e1.customer_no) as customers
    			from T_EADDRESS as e1
    				inner join TR_EADDRESS_TYPE as et on et.id = e1.eaddress_type
    					and et.email_ind = 'Y'
    			group by e1.address
    		) as cust_count on cust_count.address = e.address
    	) as de on de.customer_no = c.customer_no
    where
    	c.inactive = 1
    	--and de.connected_login = 'Y' and de.customers = 1 --connected login, single customer account
    	--and de.connected_login = 'N' and de.customers = 1 --no logins, single customer account
    	--and de.connected_login = 'Y' and de.customers > 1 and de.hh_dup = 'Y' --connected login, dup emails within a household
    	--and de.connected_login = 'N' and de.customers > 1 and de.hh_dup = 'Y' --no logins, dup emails within a household
    	--and de.connected_login = 'Y' and de.customers > 1 and de.hh_dup = 'N' --connected login, dup emails outside a household
    	--and de.connected_login = 'N' and de.customers > 1 and de.hh_dup = 'N' --no logins, dup emails outside a household
    ;

    This will give you every customer who has at least one email address that is duplicated somewhere.  Probably a lot!  However, at the bottom of that block you see six "commented out" lines (i.e. they start with "--").  Remove the first two dashes on any one of them, and the results will be limited as the second comment says.  In this fashion you can prioritize your work.  I've identified where at least one of the email addresses found is connected to a login, since those are probably more important to fix first?  Then there are three other groupings.  First are email addresses where there are duplicates, but all duplicates are on the same constituent record.  With these you can just click through to the constituent and know that you just have to fix the duplicates that you can see.  The second are duplicates where the other email address is owned by a different member of the same household.  These, again, can be more easily corrected, since you can just hop between household members using the affiliations menu.  Lastly there are duplicates where the duplicate emails are on "unrelated" accounts.  Here you'll have to go to the account, look at its email addresses, an then use those in turn in the constituent search menu.

    Since, in the case of household or unrelated customer account, the list will contain both customer records of a duplicated email address, periodically regenerating the list will ideally keep you from clicking into accounts you've already fixed.

    Caveat 1: this will not work if you are in a consortium and your consortium members don't share constituent accounts.  In that case (we're one!) you'll get a ton of copied emails for each consortium member's record of the same person.

    Caveat 2: I am not screening out inactive email addresses, nor inactive logins from consideration.

Reply
  • Okay, so I came up with a manual List query to at least organize the work a bit.  If you create a list with one trivial criteria, save, and then go to "show query" you can delete all of the code in the box and replace it with this:

    select
    	distinct de.customer_no
    from T_CUSTOMER as c
    	inner join (
    	select
    		e.customer_no,
    		case when e.login_no = -1 and d.login_no = -1 then 'N' else 'Y' end as connected_login,
    		case when h.customer_no is not null then 'Y' else 'N' end as hh_dup,
    		cust_count.customers
    	from (
    		select e1.*,
    			ISNULL(cl.login_no, -1) as login_no
    		from T_EADDRESS as e1
    			inner join TR_EADDRESS_TYPE as et on et.id = e1.eaddress_type
    				and et.email_ind = 'Y'
    			left outer join T_CUST_LOGIN as cl on cl.eaddress_no = e1.eaddress_no
    		) as e
    		inner join (
    			select e1.*,
    				ISNULL(cl.login_no, -1) as login_no
    			from T_EADDRESS as e1
    				inner join TR_EADDRESS_TYPE as et on et.id = e1.eaddress_type
    					and et.email_ind = 'Y'
    				left outer join T_CUST_LOGIN as cl on cl.eaddress_no = e1.eaddress_no
    		) as d on d.address = e.address and d.eaddress_no <> e.eaddress_no
    		left outer join V_CUSTOMER_WITH_HOUSEHOLD as h on h.customer_no = e.customer_no
    			and h.expanded_customer_no = d.customer_no
    			and h.customer_no <> h.expanded_customer_no
    		left outer join (
    			select e1.address, COUNT(distinct e1.customer_no) as customers
    			from T_EADDRESS as e1
    				inner join TR_EADDRESS_TYPE as et on et.id = e1.eaddress_type
    					and et.email_ind = 'Y'
    			group by e1.address
    		) as cust_count on cust_count.address = e.address
    	) as de on de.customer_no = c.customer_no
    where
    	c.inactive = 1
    	--and de.connected_login = 'Y' and de.customers = 1 --connected login, single customer account
    	--and de.connected_login = 'N' and de.customers = 1 --no logins, single customer account
    	--and de.connected_login = 'Y' and de.customers > 1 and de.hh_dup = 'Y' --connected login, dup emails within a household
    	--and de.connected_login = 'N' and de.customers > 1 and de.hh_dup = 'Y' --no logins, dup emails within a household
    	--and de.connected_login = 'Y' and de.customers > 1 and de.hh_dup = 'N' --connected login, dup emails outside a household
    	--and de.connected_login = 'N' and de.customers > 1 and de.hh_dup = 'N' --no logins, dup emails outside a household
    ;

    This will give you every customer who has at least one email address that is duplicated somewhere.  Probably a lot!  However, at the bottom of that block you see six "commented out" lines (i.e. they start with "--").  Remove the first two dashes on any one of them, and the results will be limited as the second comment says.  In this fashion you can prioritize your work.  I've identified where at least one of the email addresses found is connected to a login, since those are probably more important to fix first?  Then there are three other groupings.  First are email addresses where there are duplicates, but all duplicates are on the same constituent record.  With these you can just click through to the constituent and know that you just have to fix the duplicates that you can see.  The second are duplicates where the other email address is owned by a different member of the same household.  These, again, can be more easily corrected, since you can just hop between household members using the affiliations menu.  Lastly there are duplicates where the duplicate emails are on "unrelated" accounts.  Here you'll have to go to the account, look at its email addresses, an then use those in turn in the constituent search menu.

    Since, in the case of household or unrelated customer account, the list will contain both customer records of a duplicated email address, periodically regenerating the list will ideally keep you from clicking into accounts you've already fixed.

    Caveat 1: this will not work if you are in a consortium and your consortium members don't share constituent accounts.  In that case (we're one!) you'll get a ton of copied emails for each consortium member's record of the same person.

    Caveat 2: I am not screening out inactive email addresses, nor inactive logins from consideration.

Children