T-SQL Pattern Question for the Weekend: Optional Parameters

Gawain Lavers $organization over 6 years ago

I very often find myself building a query (usually a Stored Procedure) that uses optional parameters.

One set of examples are a number of data consistency procedures we have: they are designed to run in LP_CUSTOMER_RANK on a single customer and nightly over all customers to sweep up any activity that affected the client but happened outside of tables that trigger LP_CUSTOMER_RANK. I could build out and maintain two procedures (one of my predecessors had originally built a lot of these to take a customer number and then built a second procedure that looped over all customers numbers to do the nightly run: our scheduled jobs were taking upwards of 4 hours to complete), but obviously I'd rather not. So I have one procedure that takes customer number as a parameter, and in the query have the structure:

((@customer_no is null) or (@customer_no = c.customer_no))

This is great for the nightly job, but for whatever reason (boggling to me as a classical programmer, surely it just short circuits at the first comparison and your done?) this is relatively (sometimes very) inefficient compared with @customer_no = c.customer by itself.

Another solution would be to build the query as an inline function without that part of the where clause, and then have an if-then clause that runs the function without the @customer_no = c.customer_no if it is null, or runs it with if not null.

if @customer_no is null
begin
    select mif.* from dbo.MyInlineFunction() as mif
end
else begin
    select mif.* from dbo.MyInlineFunction() as mif
        where @customer_no = mif.customer_no
end

This is great (as long as this procedure isn't part of a much larger query or operation that would have to be duplicated in the code), but if I'm in another extremely common report use case, where I have optional parameters for @start_dt and @end_dt, then I have to have four cases in my "if-else-etc": no parameters; just @start_dt; just @end_dt; and both @start_dt and @end_dt.

If my report has optional @start_dt, @end_dt and @customer_no it's 6 cases. And so on.

My next option to examine is sp_executesql, although I'm concerned that the performance hit for that would be as bad as ((@customer_no is null) or...

I've not been able to ask Google or Stack this question cleverly enough to come up with a good answer. Does anyone here have a nice solution?

Parents

Nick Reilingh $organization over 6 years ago
You didn't mention performance differences between code running in the stored procedure vs. running in SSMS, so I'm inclined to think this is not a parameter sniffing issue. But to really understand your performance problems here you're still probably going to need to investigate the query plans that are being built for these statements.

A couple of concepts I can think of that may apply here:

In your first example, where you expect @customer IS NULL OR ... to just short-circuit, remember that the set-wise nature of SQL processing means that the plan for a query is generated BEFORE evaluating any of the logic in that plan. That means that the SQL engine has already gone and retrieved the c.customer_no column that you want to use in the expression BEFORE it evaluates @customer_no IS NULL. Also consider that the most efficient way for the SQL engine to find the row where @customer_no = c.customer_no is to do an index seek. If you investigate the actual generated query plan, you may find that instead the engine is looping through every row in c and checking the equality condition for each row—equivalent to an index scan. SQL Server isn't completely devoid of short-circuit behavior, but it's really more of an edge case due to the declarative nature of T-SQL. This article has a good dive into the topic.

A solution to this might be to extract out the "all customers or one customer" logic from your WHERE clause into a much simpler query that populates a temp table. At the top of your procedure, do:

CREATE TABLE #customers ( customer_no int PRIMARY KEY ); INSERT #customers (customer_no) SELECT customer_no FROM dbo.T_CUSTOMER WHERE @customer_no IS NULL OR customer_no = @customer_no;

Then in the rest of your procedure, just do a simple join condition on #customers. Since #customers has a primary key on customer_no, the SQL engine can optimize this to something efficient like a merge join instead of doing inner loops, which is presumably what's taking so long in your your current code.

Your mention of @start_dt and @end_dt also remind me of "SARGability", which is at least a relevant topic to performance in filter expressions, though I'm not sure it applies specifically here with a typical WHERE (@start_dt IS NULL OR o.order_dt >= @start_dt). Again, you won't really know for sure until you start investigating the query plans. Often, I find that just rewriting a procedure to be a bit more structured and sequenced is enough to help the SQL Engine figure out a more efficient way to do things. But it's also true that the main problem here is that the most efficient ways for the SQL engine to do single-customer updates vs. all-customer updates are DRAMATICALLY DIFFERENT, despite the fact that your business logic should be the same. Isolating your filters from your business logic, like in the temp table example above, might be the easiest way to facilitate this.
Cancel
Vote Up +1 Vote Down

Sign in to reply

Cancel
Gawain Lavers $organization over 6 years ago in reply to Nick Reilingh

I have definitely used temporary tables to try and minimize searches in the past, but I have two concerns for it with my data integrity scenario. On the "trigger" side (i.e. LP_CUSTOMER_RANK), I'm introducing the overhead of creating a temporary table for every such procedure I add to that function, while with the nightly job I'm basically building a temporary table with every active customer account in it (i.e. half a million rows). That latter is probably not a significant issue for such a process, but just feels a little ugly.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Nick Reilingh $organization over 6 years ago in reply to Gawain Lavers

Basically right after I wrote this post I listened to a podcast that talked more about this very issue: http://runasradio.com/Shows/Show/673

At the end of the day, this kind of code reuse probably isn't going to be possible if you want optimal performance (unless you venture into building some kind of T-SQL preprocessor), due to what I mentioned about optimal query plans being significantly different for single-access vs. bulk-access. I'd recommend optimizing LP_CUSTOMER_RANK for single-access, and building your separate procedure optimized for bulk-access. There's always tSQLt for unit testing if you want to make sure the same logic is being applied by both procedures!

I also want to mention that creating temp tables is often not as bad as you think for performance. Even if it contains every customer ID! SQL Server doesn't really care about the number of rows it has to scan, but it does care about the width of the index that is being scanned. A one-column int index is about as good as you can get.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
John A. Moskal II $organization over 6 years ago in reply to Nick Reilingh
I second the idea that temporary tables are not as bad as you think they are. I know that CTEs are generally the first place to start, but I end up using temporary tables quite often. Generally, when it comes to the two, I tend to default to this logic (from Brent Ozar; I cannot take credit for it):

CTEs are usually better when:

SQL Server can do a good job of estimating how many rows will come out of it, and the contents of what those rows will be, or

When what comes out of the CTE doesn’t really influence the behavior of the rest of the query, or

When you’re not sure what portions of the CTE’s data will actually be necessary for the rest of the query (because SQL Server can figure out what parts to execute, and what parts to simply ignore)

Temp tables are usually better when:

You have to refer to the output multiple times, or

When you need to pass data between stored procedures, or

When you need to break a query up into phases to isolate unpredictable components that dramatically affect the behavior of the rest of the query

Regarding Nick's suggested solution, I literally do that exact thing in a number of my own local procedures for similar situations as you are describing, and it has worked flawlessly for me.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel

Reply

John A. Moskal II $organization over 6 years ago in reply to Nick Reilingh
I second the idea that temporary tables are not as bad as you think they are. I know that CTEs are generally the first place to start, but I end up using temporary tables quite often. Generally, when it comes to the two, I tend to default to this logic (from Brent Ozar; I cannot take credit for it):

CTEs are usually better when:

SQL Server can do a good job of estimating how many rows will come out of it, and the contents of what those rows will be, or

When what comes out of the CTE doesn’t really influence the behavior of the rest of the query, or

When you’re not sure what portions of the CTE’s data will actually be necessary for the rest of the query (because SQL Server can figure out what parts to execute, and what parts to simply ignore)

Temp tables are usually better when:

You have to refer to the output multiple times, or

When you need to pass data between stored procedures, or

When you need to break a query up into phases to isolate unpredictable components that dramatically affect the behavior of the rest of the query

Regarding Nick's suggested solution, I literally do that exact thing in a number of my own local procedures for similar situations as you are describing, and it has worked flawlessly for me.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel

Children

No Data