T-SQL functions do no imply a certain order of execution

August 10th, 2011

Looking at this question on StackOverflow: Conversion failed when converting from a character string to uniqueidentifier error in SQL Server one can see a reproducible example where a string split UDF works fine in the SELECT statement, but it gives a conversion error in the DELETE statement. Certainly, a bug, right? Actually, not.

The issue at hand is in fact remarkably similar to another common misconception around T-SQL I had to debunk some time ago, see On SQL Server boolean operator short-circuit: that C like operator short-circuit is guaranteed in T-SQL (hint: it isn’t, read the linked article to see a clear counter example).

In the StackOverlow post the misconception is that order of declaration implies order of execution, that the function is evaluated somehow separately from the rest of the query and some sort of temporary result is created that is then used in the overall query execution. This understanding comes naturally to the imperative procedural language mindset of developers trained in C, C++, C# and other similar languages. But in the SQL Server declarative language that is T-SQL, your intuition is actually wrong. To illustrate I will give a simple counter-example, reusing the code from my earlier boolean short-circuit article:


create table eav (
    eav_id int identity(1,1) primary key,
    attribute varchar(50) not null,
    is_numeric bit not null,
    [value] sql_variant null);
create index eav_attribute on eav(attribute) include ([value]);
go

-- Fill the table with random values
set nocount on
declare @i int;
select @i = 0;
begin tran
while @i < 100000
begin
    declare @attribute varchar(50),
        @is_numeric bit,
        @value sql_variant;
    select @attribute = 'A' + cast(cast(rand()*1000 as  int) as varchar(3));
    select @is_numeric = case when rand() > 0.5 then 1 else 0 end;
    if 1=@is_numeric
        select @value = cast(rand() * 100 as int);
    else
        select @value = 'Lorem ipsum';
    insert into eav (attribute, is_numeric, [value])
    values (@attribute, @is_numeric, @value);
    select @i = @i+1;
    if (@i % 1000 = 0)
    begin
		commit;
		raiserror(N'Inserted %d', 0,0,@i);
		begin tran;
    end
end
commit
go

-- insert a 'trap'
insert into eav (attribute, is_numeric, [value])
values ('B1', 0, 'Gotch ya');
go

Now we’re going to define our inline table UDF:


create function udf_get_only_numerics()
returns table
with schemabinding
as return
	select [value], [attribute]
	from dbo.eav
	where is_numeric = 1;
go

And now lets innocently query our UDF:


select [value]
	from dbo.udf_get_only_numerics ()
    where attribute = 'B1'
    and cast([value] as int) > 50
go

Since we’re selecting from the UDF, we’re guaranteed that we’re only going to see values that have the is_numeric column value 1, so the cast([value] as int) must always succeed, right? Wrong, we’ll get a conversion exception:


Msg 245, Level 16, State 1, Line 3
Conversion failed when converting the varchar value 'Gotch ya' to data type int.

So what’s going on? Lets insert a known value and inspect the execution plan:


insert into eav (attribute, is_numeric, [value])
values ('B2', 1, 65);
go

select [value]
	from dbo.udf_get_only_numerics ()
    where attribute = 'B2'
    and cast([value] as int) > 50
go

Looking at the plan we see that, just as in my boolean short-circuit example, the presence of an index on attribute that includes the value column but not the is_numeric was too good opportunity to pass for the optimizer. And this plan will evaluate the cast([value]as int) expression before it evaluate the WHERE clause inside the UDF. That’s right, the WHERE clause of the UDF has moved in the execution plan above the WHERE clause of the query using the UDF. The naive assumption that the function definition posses some sort of barrier for execution was proven wrong.

So how does this explains the failure seen in the StackOverflow question? The UDF, which was originally posted on Tally OH! An Improved SQL 8K “CSV Splitter” Function article on sqlservercentral.com, contains WHERE clauses to that in effect restricts the output to only complete tokens. w/o the WHERE clause it would return non-complete token. We can easily test this, lets remove the WHERE clause and use the same input as the test in the StackOverflow question:


CREATE FUNCTION dbo.DelimitedSplit8K_noWhere
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
     -- enough to cover VARCHAR(8000)
 WITH E1(N) AS (
   SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
   SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
   SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (
                 SELECT 0 UNION ALL
                 SELECT TOP (DATALENGTH(ISNULL(@pString,1)))
                   ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (
                 SELECT t.N+1
                   FROM cteTally t
                )
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
        Item  = SUBSTRING(@pString,s.N1,
                ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))
   FROM cteStart s;

When we run this the output contains not only the two token in the input, but also every substring of these tokens:


-- Test string that will be split into table in the DelimitedSplit8k function
declare @temp varchar(max) =
     '918E809E-EA7A-44B5-B230-776C42594D91,6F8DBB54-5159-4C22-9B0A-7842464360A5'
select Item from dbo.DelimitedSplit8K_nowhere(@temp, ',');
go
Item
-----------------------------------------
918E809E-EA7A-44B5-B230-776C42594D91
18E809E-EA7A-44B5-B230-776C42594D91
8E809E-EA7A-44B5-B230-776C42594D91
E809E-EA7A-44B5-B230-776C42594D91
809E-EA7A-44B5-B230-776C42594D91
09E-EA7A-44B5-B230-776C42594D91
9E-EA7A-44B5-B230-776C42594D91
E-EA7A-44B5-B230-776C42594D91
-EA7A-44B5-B230-776C42594D91
...

By now the explanation to the issue seen in the StackExchange question should be obvious: just as in my example, the WHERE clause was pulled out from the function and placed in the generated plan above the cast(Item as uniqueidentifier) expression. During execution SQL Server is asked to convert the string ’18E809E-EA7A-44B5-B230-776C42594D91′ to a uniqueidentifier and, understandably, it fails.

Do no make assumptions about order of executions when using inline table UDFs. Q.E.D.

Online Index Operations for indexes containing LOB columns

August 5th, 2011

SQL Server supports online index and table rebuild operations which allow for maintenance operations to occur w/o significant downtime. While a table is being rebuild, or a new index is being built on it, the table is fully utilizable. It can be queried and any updates done to the table while the online rebuild operation is occurring will be contained in the final rebuilt table. A detailed explanation on how these online rebuild operations work can be found in the Online Indexing Operations in SQL Server 2005 white paper. But Online Index Build operations in SQL Server 2005, 2008 and 2008 R2 do not support tables that contain LOB columns, attempting to do so would trigger an error:

Msg 2725, Level 16, State 2, Line …
An online operation cannot be performed for index ‘…’ because the index contains column ‘…’ of data type text, ntext, image, varchar(max), nvarchar(max), varbinary(max), xml, or large CLR type. For a non-clustered index, the column could be an include column of the index. For a clustered index, the column could be any column of the table. If DROP_EXISTING is used, the column could be part of a new or old index. The operation must be performed offline.

To be accurate the restriction applies not to tables, but to any index or heap that contains an LOB column. That, of course, includes any clustered index or the base heap of a table if the table contains any LOB columns, but it would include any non-clustered index that includes a LOB column. In other words I can rebuild online non-clustered indexes of any table as long as they don’t use the INCLUDE clause to add a LOB column from the base table, but for sure I cannot rebuild online the table itself (meaning the clustered index or the heap). Nor can I add a clustered index to a base heap online, if the table contains LOB columns.

The question whether one should just use VARCHAR(MAX) and stop worrying about the chosen field size has came up on StackOverflow several times (varchar(max) everywhere?) and I always pointed out that there are at least some limitations (impossibility to do do online maintenance rebuild operations, impossibility to index such fields) and also all MAX types have a slight performance overhead, see Performance comparison of varchar(max) vs. varchar(N)).

Online Index Build, now with LOBs

Starting with SQL Server 11 it is actually permitted to build (and rebuild) online indexes and heaps containing LOB columns. The old legacy types (text, ntext and image) are not supported, not surprising considering that these types are on the deprecation path.

To understand why the original online rebuild operations from previous versions did not support LOB columns we need to consider the SQL Server Table and Index Organization. All indexes and tables consist of three allocation units: one for row data, one for overflow row data and one for LOB data. We can see this if we inspect the sys.system_internals_allocation_units system catalog view:


create table test (id int not null identity(1,1),
	somevar1 varchar(6000),
	somevar2 varchar(6000),
	someblob varchar(max))
go

insert into test (somevar1, somevar2, someblob) values ('A', 'B', 'C');
insert into test (somevar1, somevar2, someblob) values
         (replicate('A', 6000), replicate('B', 6000), replicate('C', 8000))
go

select au.*
from sys.system_internals_allocation_units au
join sys.system_internals_partitions p on au.container_id = p.partition_id
where p.object_id = object_id('test');
go

Our test table shows three allocation units. Now lets rebuild our table and look again at our allocation units:


alter table test rebuild;
go

select au.*
from sys.system_internals_allocation_units au
join sys.system_internals_partitions p on au.container_id = p.partition_id
where p.object_id = object_id('test');
go

We can see that our DATa and SLOB (aka. row overflow) allocation units have changed because they were rebuilt (they have different IDs and start at different pages). But the important thing is that the BLOB allocation unit has not changed. After the offline table rebuild, it has the same ID and starts at the same pages. This is because table and index rebuild operations do not rebuild the LOB data. They rebuild the row data and the row-overflow data, but the newly built rows will simply point back to the same old LOB data. The idea is that tables with LOB columns have large LOB values and rebuilding the LOB data would be prohibitive, with little or no benefit.

Offline operations can avoid rebuilding the LOB data without problems, but for online index and table rebuilds this poses an issue: for the duration of the online rebuild operation both the old rowset (the old index/table) and the new rowset would point to the same LOB data while updates are being made to rows.

In SQL Server 11 this problem was solved and now online operations can rebuild indexes and tables with LOB columns while keeping the data in the LOB allocation unit in a consistent state. SQL Server will internally track how LOB data is referenced by both the old index and the new index being built and will take appropriate actions to manage the sharing of the LOB data.

Limitations

The following restrictions and limitations apply only for the duration of the Online Index Rebuild operation:

Partial LOB .WRITE updates are transformed into full updates.
LOB data supports a highly efficient update mode, the .WRITE syntax. This is critical in creating streaming semantics, see Download and Upload images from SQL Server via ASP.Net MVC. When the .WRITE syntax is used on a LOB column belonging to an index that is being rebuilt online the generated plan will silently change it into a full value update, which generates significantly more log. If you rely heavily on this functionality be aware and schedule your online rebuilds accordingly.
DBCC CHECK operations will skip the consistency check of LOB allocation units belonging to indexes that are in the process of being rebuilt online.
During the online operation the LOB allocation unit is shared between the old index and the new index and is consistent if you consider both owners, however it may look inconsistent if considered from either one of the owner point of view.
File SHRINK operation will skip pages belonging to LOB allocation units belonging to indexes that are in the process of being rebuilt online.
If LOB data is shrunk the pointers in the ROW data referencing the LOB data that had moved have to be updated. While an online index rebuild occurs there could be two sets of pointers referencing the same LOB data, one in the old rowset and one in the new rowset.