Hello all!
Anyone up for a SQL technical challenge/question. I’m re-writing an old report that had quite a few parameters on it and my predecessor basically had the same query copied several times but with slightly different WHERE clauses, surrounded by IFs to control the flow based on the parameter value. Here is a simplified example:
IF @filter = ‘F’
THEN
SELECT a, b, c FROM t_perf WHERE perf_dt BETWEEN @start_dt AND @end_dt
END
ELSE IF @filter = ‘Y’
SELECT a, b, c FROM t_perf WHERE schedule_dt BETWEEN @start_dt AND @end_dt
This is just an example and the actual queries are MUCH larger and complex. So you can imagine when we want to add a column to the result set or change something, I have to do it in several places and testing also becomes more difficult. So my idea was to include all of these filters in the same select using a bit of boolean algebra. It works great but I can’t figure out why my performance is as a poor as it is. So I’m going to give you just the FROM and WHERE parts because those are the most important here:
Here is what I wrote. I originally had more in the WHERE but did some filtering prior when inserting into the temp tables (perf, mos, price_type) hoping to help, alas it didn’t.
Basically if @date_type is ‘F’ I want start/end date to filter by perf_dt and if its ‘Y’ to filter on the existence of a payment schedule date.
SELECT 1
FROM t_order od (NOLOCK)
JOIN t_customer cs (NOLOCK) ON od.customer_no = cs.customer_no
JOIN t_lineitem li (nolock) ON li.order_no = od.order_no
JOIN t_sub_lineitem sli (nolock) ON sli.li_seq_no = li.li_seq_no
JOIN #perf pf (NOLOCK) ON pf.perf_no = sli.perf_no
JOIN lv_prod_season_w_desc ps (NOLOCK) ON ps.prod_season_no = pf.prod_season_no
JOIN tr_season sn (NOLOCK) ON ps.season = sn.id
JOIN #mos ms (NOLOCK) ON ms.id = od.mos
JOIN #price_type pt (NOLOCK) ON pt.id = sli.price_type
WHERE (@date_type <> 'F' OR pf.perf_dt BETWEEN @start_dt AND @end_dt)
AND
(
@date_type <> 'Y' OR
EXISTS(
FROM t_order_schedule os1 (NOLOCK)
WHERE os1.order_no = od.order_no
AND os1.due_dt BETWEEN @start_dt AND @end_dt)
)
AND sli.sli_status IN (1, 2, 3, 12)
But when I have all my SELECT columns defined and pass ‘F’ as @date_type this query takes about 25 seconds. So just as a test I commented out the @date_type <> 'Y' part, and it ran in 1 second. Now I realize that taking that part gives me incorrect results but the speed difference is what is bothering me. If anything I would expect it to take longer because with the @date_type <> 'Y’ part in place, it should have keep it from having to run the subquery at all. Right?! I am hard pressed to believe that a simple variable comparison is THAT much more expensive than a subquery even with its indexes covered. Or might I be missing something?
Thanks for the help in advance!
Hi Sean,
I can see you use several temp tables here. If they are small then you do not need to worry about having indexes in them; otherwise the indexes would improve performance of you SQL. I would suggest to add indexes on those columns of large temp tables which are used in the Join and "where" clause.
So add to you procedure, for instance :
create index ndx_tmp_tbl --index is non-unique by defaultON #perf(perf_dt)
Simon Basyuk, DBA of Carnegie Hall
sbasyuk@CarnegieHall.org