T-SQL functions do no imply a certain order of execution
August 10th, 2011Looking at this question on StackOverflow: Conversion failed when converting from a character string to uniqueidentifier error in SQL Server one can see a reproducible example where a string split UDF works fine in the SELECT statement, but it gives a conversion error in the DELETE statement. Certainly, a bug, right? Actually, not.
The issue at hand is in fact remarkably similar to another common misconception around T-SQL I had to debunk some time ago, see On SQL Server boolean operator short-circuit: that C like operator short-circuit is guaranteed in T-SQL (hint: it isn’t, read the linked article to see a clear counter example).
In the StackOverlow post the misconception is that order of declaration implies order of execution, that the function is evaluated somehow separately from the rest of the query and some sort of temporary result is created that is then used in the overall query execution. This understanding comes naturally to the imperative procedural language mindset of developers trained in C, C++, C# and other similar languages. But in the SQL Server declarative language that is T-SQL, your intuition is actually wrong. To illustrate I will give a simple counter-example, reusing the code from my earlier boolean short-circuit article:
create table eav (
eav_id int identity(1,1) primary key,
attribute varchar(50) not null,
is_numeric bit not null,
[value] sql_variant null);
create index eav_attribute on eav(attribute) include ([value]);
go
-- Fill the table with random values
set nocount on
declare @i int;
select @i = 0;
begin tran
while @i < 100000
begin
declare @attribute varchar(50),
@is_numeric bit,
@value sql_variant;
select @attribute = 'A' + cast(cast(rand()*1000 as int) as varchar(3));
select @is_numeric = case when rand() > 0.5 then 1 else 0 end;
if 1=@is_numeric
select @value = cast(rand() * 100 as int);
else
select @value = 'Lorem ipsum';
insert into eav (attribute, is_numeric, [value])
values (@attribute, @is_numeric, @value);
select @i = @i+1;
if (@i % 1000 = 0)
begin
commit;
raiserror(N'Inserted %d', 0,0,@i);
begin tran;
end
end
commit
go
-- insert a 'trap'
insert into eav (attribute, is_numeric, [value])
values ('B1', 0, 'Gotch ya');
go
Now we’re going to define our inline table UDF:
create function udf_get_only_numerics()
returns table
with schemabinding
as return
select [value], [attribute]
from dbo.eav
where is_numeric = 1;
go
And now lets innocently query our UDF:
select [value]
from dbo.udf_get_only_numerics ()
where attribute = 'B1'
and cast([value] as int) > 50
go
Since we’re selecting from the UDF, we’re guaranteed that we’re only going to see values that have the is_numeric column value 1, so the cast([value] as int) must always succeed, right? Wrong, we’ll get a conversion exception:
Msg 245, Level 16, State 1, Line 3
Conversion failed when converting the varchar value 'Gotch ya' to data type int.
So what’s going on? Lets insert a known value and inspect the execution plan:
insert into eav (attribute, is_numeric, [value])
values ('B2', 1, 65);
go
select [value]
from dbo.udf_get_only_numerics ()
where attribute = 'B2'
and cast([value] as int) > 50
go
Looking at the plan we see that, just as in my boolean short-circuit example, the presence of an index on attribute that includes the value column but not the is_numeric was too good opportunity to pass for the optimizer. And this plan will evaluate the cast([value]as int) expression before it evaluate the WHERE clause inside the UDF. That’s right, the WHERE clause of the UDF has moved in the execution plan above the WHERE clause of the query using the UDF. The naive assumption that the function definition posses some sort of barrier for execution was proven wrong.
So how does this explains the failure seen in the StackOverflow question? The UDF, which was originally posted on Tally OH! An Improved SQL 8K “CSV Splitter” Function article on sqlservercentral.com, contains WHERE clauses to that in effect restricts the output to only complete tokens. w/o the WHERE clause it would return non-complete token. We can easily test this, lets remove the WHERE clause and use the same input as the test in the StackOverflow question:
CREATE FUNCTION dbo.DelimitedSplit8K_noWhere
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1)))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (
SELECT t.N+1
FROM cteTally t
)
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))
FROM cteStart s;
When we run this the output contains not only the two token in the input, but also every substring of these tokens:
-- Test string that will be split into table in the DelimitedSplit8k function
declare @temp varchar(max) =
'918E809E-EA7A-44B5-B230-776C42594D91,6F8DBB54-5159-4C22-9B0A-7842464360A5'
select Item from dbo.DelimitedSplit8K_nowhere(@temp, ',');
go
Item
-----------------------------------------
918E809E-EA7A-44B5-B230-776C42594D91
18E809E-EA7A-44B5-B230-776C42594D91
8E809E-EA7A-44B5-B230-776C42594D91
E809E-EA7A-44B5-B230-776C42594D91
809E-EA7A-44B5-B230-776C42594D91
09E-EA7A-44B5-B230-776C42594D91
9E-EA7A-44B5-B230-776C42594D91
E-EA7A-44B5-B230-776C42594D91
-EA7A-44B5-B230-776C42594D91
...
By now the explanation to the issue seen in the StackExchange question should be obvious: just as in my example, the WHERE clause was pulled out from the function and placed in the generated plan above the cast(Item as uniqueidentifier) expression. During execution SQL Server is asked to convert the string ’18E809E-EA7A-44B5-B230-776C42594D91′ to a uniqueidentifier and, understandably, it fails.
Do no make assumptions about order of executions when using inline table UDFs. Q.E.D.
Great insight. Remus, excellent work, as always.