SQL help - Trying to find highest gift amount for each hh/ind, but getting more than just max contribution amount

Hello hive mind!

I'm trying to pull a list of max donation amounts for people who qualify as a major donor, however, my query is pulling more than just the max value for a given constituent.  Essentially I'm getting any contribution amount for the individual that qualifies as a major gift. I just want to return one row per constituent ID that has their highest ever gift amount. What am I missing here?  I'm still learning SQL, so any help is greatly appreciated. 

Here is the code:

SELECT a.cont_amt, a.customer_no, a.cont_dt, b.fname, b.lname, b.sort_name
FROM impresario.dbo.T_CONTRIBUTION a
JOIN impresario.dbo.T_CUSTOMER b
ON a.customer_no = b.customer_no
WHERE EXISTS (SELECT MAX(cont_amt), customer_no
FROM impresario.dbo.T_CONTRIBUTION
GROUP BY customer_no
HAVING MAX(cont_amt) > 1500)
ORDER BY a.cont_amt DESC;

  • Jordan,

    This is one of those SQL fun things that is always a little quirky.  What you are trying to find is the associated A, B, C, D, etc... columns where column X is at its max.  Really, the best way to do that is using CTEs (or temporary tables, depending upon your situation) using the ROW_NUMBER() function as a ranking solution.

    Here is something that should get you most of the way there:

    ; --Do not omit this
    WITH cte_max_conts (cont_amt,customer_no,ref_no,max_cont_rank)
    AS    (
            SELECT    ci.cont_amt,
                    ci.customer_no,
                    ci.ref_no,
                    ROW_NUMBER() OVER(PARTITION BY ci.customer_no ORDER BY ci.cont_amt DESC) AS max_cont_rank
            FROM    T_CONTRIBUTION ci
            WHERE    ci.cont_amt > 1500
        )
    SELECT    c.cont_amt,
            c.customer_no,
            c.cont_dt,
            cr.fname,
            cr.lname,
            cr.sort_name
    FROM        T_CONTRIBUTION c
        JOIN    cte_max_conts cte ON c.ref_no = cte.ref_no
        JOIN    T_CUSTOMER cr ON c.customer_no = cr.customer_no
    WHERE    cte.max_cont_rank = 1

    Let me know if that does not help.

    John

    P.S. to anyone out there on Tessitura's web team watching, I tried to use the "insert code" function here, but there was something wrong with it as it kept reloading itself driving the "ok" button lower.  Windows 10, Firefox, updated to the most recent version of the browser.

  • Ahh, thank you! That worked!  I was trying to use a CTE (without really understanding what it was doing), but couldn't find enough help on how to do it properly using my google skills, haha. 

    I was also having that issue with the insert code function!

  • Just think of a CTE as a one-time use temporary table.  You create it, and then just use it immediately in your query below.

    As for the kicker, though, that would be the ROW_NUMBER() OVER (PARTITION BY...) section.  That function is highly useful and worth researching.  The details are a little more complex than this, but it works sort of like an internal GROUP BY and ORDER BY together for the SELECT area of your current query without GROUPing or ORDERing the rest of the query.

    I paraphrased a bit, but if you use what I sent plus a little documentation, that should pretty much get you there.

  • Amazing, thank you!  I will definitely research that more. 

  • The CTE is not required for the data you were looking for. CTE's add additional overhead, be careful in using them.

    The query can be built like this:

    Set ArithAbort On
    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

    Select CT.customer_no , isnull(C.fname,'') as fname , C.lname , c.sort_name, cont_dt , cont_amt   from
    (Select customer_no, cont_amt , Cont_dt
    , ROW_NUMBER () over (partition by Customer_No Order by Cont_AMT desc, Cont_Dt desc) as SeqNo from T_CONTRIBUTION
    where cont_amt > 1500
     ) as CT inner join T_CUSTOMER  as C on CT.customer_no = C.customer_no
    where SeqNo = 1

    Performance statistics without CTE:

    (3903 rows affected)
    Table 'T_CONTRIBUTION'. Scan count 9, logical reads 40705, physical reads 0, read-ahead reads 40705, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'T_CUSTOMER'. Scan count 0, logical reads 20435, physical reads 1, read-ahead reads 2717, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Performance statistics using Jordan's sample with CTE

    (3903 rows affected)
    Table 'T_CONTRIBUTION'. Scan count 18, logical reads 81410, physical reads 0, read-ahead reads 40705, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'T_CUSTOMER'. Scan count 0, logical reads 20868, physical reads 1, read-ahead reads 2717, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    As you can see the CTE version doubled the number of logical reads, scan count, created an extra work table and did more reads on the customer table. The execution plan was also much simpler for the one without the CTE.

  • You are not incorrect, and it is true, I will often make different decisions when I do one-time queries versus queries that I am putting into a procedure.  CTEs can be helpful, but can be dangerous, too.  Same thing with temporary table, cursors and many other things.

    I certainly was not trying to imply that this was the only way to get it done, and your method indeed simplifies the process one further step by internalizing the initial pull.  On the other hand, there are also similar situations where a CTE would be more necessary.  But always helpful to make sure both perspectives on anything like CTEs or Cursors are represented to make sure no one goes crazy.

    I have seen databases tied up in nasty ways due to my own carelessness, so I am always grateful for reminders.  Slight smile

  • Thanks for this info!  Since I'm still learning, would you mind explaining when you would use a CTE as opposed to not?  If you aren't seeing the statistics until after you execute the query, what would make you decide that it was necessary to use or not?  Does that make sense?  

  • What would make a CTE dangerous in an instance like this (or not like this for that matter)?  

  • In this example a CTE is not overly dangerous, it is just making SQL Server work a little harder than it needs to for the same results. What you want to avoid is getting into coding habits that could potentially have an adverse impact when coding for something much more impactful due to frequency of use or volume of data.

    For example if this was a query to pull available performances  to display for a landing page it would be much more important to make sure it was optimally coded due to both the frequency in which it is called and volume of data.

  • Some people use them all the time, other people like me barely ever use them. Mostly they are good for doing recursive searches in the same data set. I used them like maybe 5 times in my career as I saw no real performance increase in the types of queries I was writing.

    As for knowing when to use something, while there are best practices to keep in mind when coding, the best thing you can do is test.

    For example this query might do for a first pass at pulling this data:

    Select  'C' as RecType, c.customer_no, Count(Distinct c.ref_no   )as RecCount 

    from T_TRANSACTION as T
    inner join T_CONTRIBUTION  as C on T.ref_no = c.ref_no
     where trn_type = 1 group by customer_no

    Union Select  'T' as RecType, c.customer_no, Count(Distinct t.order_no  ) as RecCount 

    from T_TRANSACTION as T
    inner join T_Order  as C on T.order_no = c.order_no
     where trn_type = 32 group by customer_no

    Here are the statistics for it:

    (447558 rows affected)
    Table 'T_TRANSACTION'. Scan count 27, logical reads 16845, physical reads 2, read-ahead reads 9743, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Workfile'. Scan count 24, logical reads 896, physical reads 64, read-ahead reads 832, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'T_CONTRIBUTION'. Scan count 9, logical reads 3059, physical reads 1, read-ahead reads 3027, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'T_ORDER'. Scan count 9, logical reads 268, physical reads 1, read-ahead reads 89, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Here is another way to write the same query (some clean up would need to be done regarding the distinct count as null would count, but for the purpose of this example, it works)

     Select iif(trn_type =1,'C','T') as RecType
     , isnull(c.customer_no,O.customer_no ), Count(Distinct isnull(c.ref_no,o.order_no ))as RecCount

    from T_TRANSACTION as T
     left join T_order as O on T.order_no = o.order_no
     left join T_CONTRIBUTION as C on t.ref_no = c.ref_no
     where trn_type in (1,32)
     group by isnull(c.customer_no,O.customer_no ), trn_type

    Here are the stats after resetting the server buffers (not recommended on a production server):

    (447558 rows affected)
    Table 'T_CONTRIBUTION'. Scan count 9, logical reads 3059, physical reads 1, read-ahead reads 3027, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'T_ORDER'. Scan count 9, logical reads 268, physical reads 1, read-ahead reads 89, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'T_TRANSACTION'. Scan count 9, logical reads 25382, physical reads 1, read-ahead reads 25078, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    While the number of logical reads to T_Transaction is higher in the second example the number of scans (sometimes a misleading number when taken out of contex), physical reads and read ahead reads are lower. Also the work file saw no use in the second example as it did not have to work through the Union statement. Also if you were to look at the execution plans of both the second one is more efficient.

  • Got it, thank you for helping me learn!

  • That makes sense!  Thank you so much.