Advanced SQL question....

Hello all!

 Anyone up for a SQL technical challenge/question.  I’m re-writing an old report that had quite a few parameters on it and my predecessor basically had the same query copied several times but with slightly different WHERE clauses, surrounded by IFs to control the flow based on the parameter value.  Here is a simplified example:

 

IF @filter = ‘F’

THEN

                SELECT a, b, c FROM t_perf WHERE perf_dt BETWEEN @start_dt AND @end_dt

END

ELSE IF @filter = ‘Y’

THEN

                SELECT a, b, c FROM t_perf WHERE schedule_dt BETWEEN @start_dt AND @end_dt

END

 

This is just an example and the actual queries are MUCH larger and complex.  So you can imagine when we want to add a column to the result set or change something, I have to do it in several places and testing also becomes more difficult.  So my idea was to include all of these filters in the same select using a bit of boolean algebra.  It works great but I can’t figure out why my performance is as a poor as it is.  So I’m going to give you just the FROM and WHERE parts because those are the most important here:

 

Here is what I wrote.  I originally had more in the WHERE but did some filtering prior when inserting into the temp tables (perf, mos, price_type) hoping to help, alas it didn’t.

 

Basically if @date_type is ‘F’ I want start/end date to filter by perf_dt and if its ‘Y’ to filter on the existence of a payment schedule date.

 

                SELECT 1

      FROM t_order od (NOLOCK)

            JOIN t_customer cs (NOLOCK) ON od.customer_no = cs.customer_no

            JOIN t_lineitem li (nolock) ON li.order_no = od.order_no

            JOIN t_sub_lineitem sli (nolock) ON sli.li_seq_no = li.li_seq_no

            JOIN #perf pf (NOLOCK) ON pf.perf_no = sli.perf_no

            JOIN lv_prod_season_w_desc ps (NOLOCK) ON ps.prod_season_no = pf.prod_season_no

            JOIN tr_season sn (NOLOCK) ON ps.season = sn.id

            JOIN #mos ms (NOLOCK) ON ms.id = od.mos  

            JOIN #price_type pt (NOLOCK) ON pt.id = sli.price_type

      WHERE (@date_type <> 'F' OR pf.perf_dt BETWEEN @start_dt AND @end_dt)

            AND

            (

  @date_type <> 'Y' OR

                  EXISTS(

                        SELECT 1

                        FROM t_order_schedule os1 (NOLOCK)

                        WHERE os1.order_no = od.order_no

AND os1.due_dt BETWEEN @start_dt AND @end_dt)

)

            AND sli.sli_status IN (1, 2, 3, 12)

 

But when I have all my SELECT columns defined and pass ‘F’ as @date_type this query takes about 25 seconds.  So just as a test I commented out the @date_type <> 'Y' part, and it ran in 1 second.  Now I realize that taking that part gives me incorrect results but the speed difference is what is bothering me.  If anything I would expect it to take longer because with the @date_type <> 'Y’ part in place, it should have keep it from having to run the subquery at all.  Right?!  I am hard pressed to believe that a simple variable comparison is THAT much more expensive than a subquery even with its indexes covered.  Or might I be missing something?

 

Thanks for the help in advance!

 

  • Quite often procedures with SQL queries that have two paths like this, with an OR condition in the WHERE clause will not perform as expected because the procedure execution plan will be compiled in a manner that makes it optimal for one part of the OR condition and then when you run it for the other path the performance suffers.  One way to deal with this is to have the procedure (if this is a procedure) set to use the RECOMPILE option.  I have had some success with this.  It is often better to have separate cases in the code however like you had before, even when that means a bit more maintenance.

  • Hi Sean,

    I can see you use several temp tables here. If they are small then you do not need to worry about having indexes in them; otherwise the indexes would improve performance of you SQL. I would suggest to add indexes on those columns of large temp tables which are used in the Join and "where" clause.

    So add to you procedure, for instance :

    create index ndx_tmp_tbl  --index is non-unique by default
    ON #perf(perf_dt)

    Simon Basyuk, DBA of Carnegie Hall

    sbasyuk@CarnegieHall.org

  • Former Member
    Former Member $organization

    Hi Sean

    I think you need to re-organise your and/or structure. (although I tend to agree with Chuck that it's often easier in this sort of thing to keep it simple, in the interests of being able to read the code next time you look at it - in a year or so -  or someone else has to work out what it's doing later. There's a trade-off there)

    However...

    When you say

     @date_type <> 'Y' OR    EXISTS(...   )

    SQL Server has to evaluate both sides of the condition in every case, irrespective of whether (@date_type <> 'Y')  is True or not., because that's the way OR works.

    If you structure it instead as

    select xyx

    where sli.sli_status IN (1, 2, 3, 12)

    AND  (

    (@date_type = 'F' AND pf.perf_dt BETWEEN @start_dt AND @end_dt)

    OR

    (@date_type = 'Y' AND exists(....    )

              )

    Then,  when( @date_type = 'Y') is False, that branch shouldn't need to be evaluated, which is what I think you're after.

     

    Ken

     

     


  • Hi Ken,

    Thanks for the reply.  I think two version of the select statement are logically equivalent.  But I'm still hung up on why both branches would need to be evaluated in the event of an OR condition.  Only an XOR requires both sides to be evaluated.  With OR, if the first condition is True why would the second need to be considered?  And likewise, in the case of AND, if the first condition is False, there is no need to look further.  A quick test might prove my point.  Run these two statements in Management Studio:

    SELECT 'no error' WHERE 1 = 1 OR 1 / 0 = 0

    SELECT 'no error' WHERE 1 = 0 OR 1 / 0 = 0

    Clearly the error in both statements is the divide by zero.  But it is only the second statement that throws the error because it is not even evaluating it in the first statement.  I found this article that may clear up the entire discussion:

    http://weblogs.sqlteam.com/mladenp/archive/2008/02/25/How-SQL-Server-short-circuits-WHERE-condition-evaluation.aspx

     

    Sean