How to enable and disable a queue using SMO

February 6th, 2013

The SMO object model for SQL Server ServiceQueue does allow one to enable or disable a queue, but the property that modifies the queue status is not intuitive, it is IsEnqueueEnabled:

Gets or sets the Boolean property that specifies whether the queue is enabled.

This property matches the catalog view column is_enqueue_enabled in sys.service_queues but bears little resemblance to the T-SQL statement used to enable or disable a queue: ALTER QUEUE ... WITH STATUS = {ON|OFF}

For example the following SMO code snippet:

            Server server = new Server("...");
            Database db = server.Databases["msdb"];
            ServiceQueue sq = new ServiceQueue(db.ServiceBroker, "foo");
            sq.Create();
            sq.IsEnqueueEnabled = false;
            sq.Alter();
            sq.IsEnqueueEnabled = true;
            sq.Alter();

generates the following T-SQL:

CREATE QUEUE [dbo].[foo];
ALTER QUEUE [dbo].[foo]  WITH STATUS = OFF ...;
ALTER QUEUE [dbo].[foo] WITH STATUS = ON ...;

Adding a nullable column can update the entire table

February 16th, 2012

In a previous article Online non-NULL with values column add in SQL Server 2012 I talked about how adding a non-null column with default values is now an online operation in SQL Server 2012 and I mentioned how the situation when the newly added column may increase the rowsize can result in the operation being performed offline:

In the case when the newly added column increases the maximum possible row size over the 8060 bytes limit the column cannot be added online.

In this article I want to show you how such a situation can arise and how it impacts even the case that prior to SQL Server 2012 was always online, namely adding a nullable column. Lets consider the following example:

create table T (
	c char(1) null,
	filler char(7998) null,
	vc1 varchar(50) null,
	vc2 varchar(50) null);
go	

insert into  T (c, vc1, vc2) values
	('A', NULL, NULL),
	('B', replicate ('X', 50), NULL),
	('C', NULL, replicate('Y', 50)),
	('D', replicate ('X', 50), replicate('Y', 50));
go

alter table T add some_int int null;
go

If you run the snippet above in SQL Server 2005, SQL Server 2008 or SQL Server 2008 R2 it will succeed, and the column some_int will be added online, as a metadata only operation. However, the resulting table has some interesting properties. Lets try to update the newly added some_int column:

update T set some_int = null
	where c= 'A';
update T set some_int = null
	where c= 'B';
update T set some_int = null
	where c= 'C';
update T set some_int = null
	where c= 'D';
go

(1 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)
Msg 511, Level 16, State 1, Line 7
Cannot create a row of size 8064 which is greater than the allowable maximum row size of 8060.
The statement has been terminated.

3 updates succeeded, the fourth failed. Well, I’ve been warned when I created the table, right? Warning: The table “T” has been created, but its maximum row size exceeds the allowed maximum of 8060 bytes. INSERT or UPDATE to this table will fail if the resulting row exceeds the size limit. But I’m updating a fixed length column and, furthermore, I’m updating it from NULL to NULL. Lets try something else:

select * into T2 from T;
go
Msg 511, Level 16, State 1, Line 1
Cannot create a row of size 8064 which is greater than the allowable maximum row size of 8060.
The statement has been terminated.

OK, how about this:

alter table T rebuild;
go
Msg 511, Level 16, State 1, Line 1
Cannot create a row of size 8064 which is greater than the allowable maximum row size of 8060.

The statement has been terminated.

So my previous sequence of actions led to a table that I cannot rebuild! Clearly, not a very desirable state.

SQL Server 2012 nullable ADD COLUMN

In SQL Server 2012 the situation above cannot happen. The adding of the column would be blocked:

create table T (
c char(1) null,
filler char(7998) null,
vc1 varchar(50) null,
vc2 varchar(50) null);
go

insert into T (c, vc1, vc2) values
('A', NULL, NULL),
('B', replicate ('X', 50), NULL),
('C', NULL, replicate('Y', 50)),
('D', replicate ('X', 50), replicate('Y', 50));
go

alter table T add some_int int null;
go

Msg 511, Level 16, State 1, Line 1
Cannot create a row of size 8064 which is greater than the allowable maximum row size of 8060.

The statement has been terminated.

So SQL Server 2012 is able to detect the problem upfront and prevent the table from reaching the state in which it cannot be rebuilt. But detecting this situation is not a straightforward matter of table metadata and column size, the situation arises only if the data in the table has this problem. After all the following sequence succeeds:

create table T (
	c char(1) null,
	filler char(7998) null,
	vc1 varchar(50) null,
	vc2 varchar(50) null);
go	

insert into  T (c, vc1, vc2) values
	('A', NULL, NULL),
	('B', replicate ('X', 50), NULL),
	('C', NULL, replicate('Y', 50));
go

alter table T add some_int int null;
go

Indeed it is only the record 'D' that has a problem because both variable columns vc1 and vc2 have values in the row that are not NULL (and of a certain size). The ALTER TABLE ... ADD COLUMN in this situation must validate each row to ensure that it fits in the page, including the newly added column size. Therefore every row gets updated and the newly added column value of NULL is populated in the row image, thus enforcing that every row fits in the table. If any row grows over the maximum size due to the newly added column then the ALTER fails and all the rows updated thus far are rolled back.

If adding a nullable column in SQL Server 2012 has the potential of increasing the row size over the 8060 size then the ALTER performs an offline size-of-data update to every row of the table to ensure it fits in the page. This behavior is new in SQL Server 2012.

This situation can arise when adding a new non-sparse fixed length column or a variable non-nullable column with default value to a table that already has the potential of creating rows over the maximum size of 8060. This new behavior (update every row in order to validate they all fit after the column is added) does not occur when adding a nullable variable length column or a sparse fixed length column.

Show all index and heap access operators in the plan cache

January 27th, 2012

with xmlnamespaces (default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
select x.value(N'@NodeId',N'int') as NodeId
	, x.value(N'@PhysicalOp', N'sysname') as PhysicalOp
	, x.value(N'@LogicalOp', N'sysname') as LogicalOp
	, ox.value(N'@Database',N'sysname') as [Database]
	, ox.value(N'@Schema',N'sysname') as [Schema]
	, ox.value(N'@Table',N'sysname') as [Table]
	, ox.value(N'@Index',N'sysname') as [Index]
	, ox.value(N'@IndexKind',N'sysname') as [IndexKind]
	, x.value(N'@EstimateRows', N'float') as EstimateRows
	, x.value(N'@EstimateIO', N'float') as EstimateIO
	, x.value(N'@EstimateCPU', N'float') as EstimateCPU
	, x.value(N'@AvgRowSize', N'float') as AvgRowSize
	, x.value(N'@TableCardinality', N'float') as TableCardinality
	, x.value(N'@EstimatedTotalSubtreeCost', N'float') as EstimatedTotalSubtreeCost
	, x.value(N'@Parallel', N'tinyint') as DOP
	, x.value(N'@EstimateRebinds', N'float') as EstimateRebinds
	, x.value(N'@EstimateRewinds', N'float') as EstimateRewinds
	, st.*
	, pl.query_plan
from sys.dm_exec_query_stats as st
cross apply sys.dm_exec_query_plan (st.plan_handle) as pl
cross apply pl.query_plan.nodes('//RelOp[./*/Object/@Database]') as op(x)
cross apply op.x.nodes('./*/Object') as ob(ox)


I recently needed a query to look into the current query plan cache and locate all actual data access operators (index scans, index seeks, table scans). This is the query I used, I decided to place it here if someone else find it useful and, more importantly, so that I can find it again when I needed it…

SQL Server table columns under the hood

October 20th, 2011

You probably can easily answer a question like ‘What columns does this table have?’. Whether you use the SSMS object explorer, or sp_help, or you query sys.column, the answer is fairly easy to find. But what is I ask ‘What are the physical columns of this table?’. Huh? Is there any difference? Lets see.

At the logical layer tables have exactly the structure you declare it in your CREATE TABLE statement, and perhaps modifications from ALTER TABLE statements. This is the layer at which you can look into sys.columns and see the table structure, or look at the table in SSMS object explorer and so on and so forth. But there is also a lower layer, the physical layer of the storage engine where the table might have surprisingly different structure from what you expect.

Inspecting the physical table structure

To view the physical table structure you must use the undocumented system internals views: sys.system_internals_partitions and sys.system_internals_partition_columns:

select p.index_id, p.partition_number,
	pc.leaf_null_bit,
	coalesce(cx.name, c.name) as column_name,
	pc.partition_column_id,
	pc.max_inrow_length,
	pc.max_length,
	pc.key_ordinal,
	pc.leaf_offset,
	pc.is_nullable,
	pc.is_dropped,
	pc.is_uniqueifier,
	pc.is_sparse,
	pc.is_anti_matter
from sys.system_internals_partitions p
join sys.system_internals_partition_columns pc
	on p.partition_id = pc.partition_id
left join sys.index_columns ic
	on p.object_id = ic.object_id
	and ic.index_id = p.index_id
	and ic.index_column_id = pc.partition_column_id
left join sys.columns c
	on p.object_id = c.object_id
	and ic.column_id = c.column_id
left join sys.columns cx
	on p.object_id = cx.object_id
	and p.index_id in (0,1)
	and pc.partition_column_id = cx.column_id
where p.object_id = object_id('...')
order by index_id, partition_number;

Lets inspect some simple table structures:

create table users (
	user_id int not null identity(1,1),
	first_name varchar(100) null,
	last_name varchar(100) null,
	birth_date datetime null);

Running our query after, of course, we specify object_id('users'):

We can see that the physical structure is very much as we expected: the physical rowset has 4 physical columns, of the expected types and sizes. One thing to notice is that, although the column order is the one we specified, the columns are layout on disk in a different order: the user_id is stored in row at offset 4, followed by the birth_date at offset 8 and then by the two variable length columns (first_name and last_name) that have negative offsets, an indication that they reside in the variable size portion of the row. This should be of no surprise as we know that the row format places all fixed length columns first in the row, ahead of the variable length columns.

Adding a clustered index

Our table right now is a heap, we should make user_id a primary key, and lets check the table structure afterward:

alter table users add constraint pk_users_user_id primary key (user_id);


As we can see, the table has changed from a heap into a clustered index (index_id changed from 0 to 1) and the user_id column has become part of the key.

Non-Unique clustered indexes

Now lets say that we want a different clustered index, perhaps by birth_date (maybe our application requires frequent range scans on this field). We would change the primary key into a non-clustered primary key (remember, the primary key and the clustered index do not have to be the same key) and add a clustered index by the birth_date column:

alter table users drop constraint pk_users_user_id;
create clustered index cdx_users on users (birth_date);
alter table users add constraint
	pk_users_user_id primary key nonclustered  (user_id);


Several changes right there:

  • A new index has appeared (index_id 2), which was expected, this is the non-clustered index that now enforces the primary key constraint on column user_id.
  • The table has a new physical column, with partition_column_id 0. This new column has no name, because it is not visible in the logical table representation (you cannot select it, nor update it). This is an uniqueifier column (is_uniqueifier is 1) because we did not specify that our clustered index in unique:

    If the clustered index is not created with the UNIQUE property, the Database Engine automatically adds a 4-byte uniqueifier column to the table. When it is required, the Database Engine automatically adds a uniqueifier value to a row to make each key unique. This column and its values are used internally and cannot be seen or accessed by users.



    Klaus Aschenbrenner has a good series of articles that explain uniqueifier columns in more detail.

  • The non-clustered index that enforces the primary key constraint has 3 physical columns: user_id and two unnamed ones. This is because each row in the non-clustered index contains the corresponding clustered index row key values, in our case the birth_date column and the uniqueifier column.

Online operations and anti-matter columns

Is the uniqueifier column the only hidden column in a table? No. Lets rebuild our non-clustered index, making sure we specify it to be an online operation:

alter index pk_users_user_id on users rebuild with (online=on);


Notice how the non-clustered index now has one more column, a new column that has the is_antimatter value 1. As you probably guess, this is an anti-matter column. For an in-depth explanation of what is the purpose of the anti-matter column I recommend reading Online Indexing Operations in SQL Server 2005:

During the build phase, rows in the new “in-build” index may be in an intermediate state called antimatter. This mechanism allows concurrent DELETE statements to leave a trace for the index builder transaction to avoid inserting deleted rows. At the end of the index build operation all antimatter rows should be cleared.

Note that even after the online operation finishes and the antimatter rows are removed, the antimatter column will be part of the physical rowset structure.

There could exist more hidden columns in the table, for example if we enable change tracking:

alter table users ENABLE CHANGE_TRACKING;


Enabling change tracking on our table has added one more hidden column.

Table structure changes

Next lets look at some table structure modifying operations: adding, dropping and modifying columns. Were going to start anew with a fresh table for our examples:

create table users  (
	user_id int not null identity(1,1) primary key,
	first_name char(100) null,
	last_name char(100) null,
	birth_date datetime null);
go


Next lets modify some columns: we want to make the birth_date not nullable and reduce the length of the first_name to 75:

alter table users alter column birth_date datetime not null;
alter table users alter column first_name char(75);
go


Notice how the two alter operations ended up in adding a new column each, and dropping the old column. But also note how the null bit and the leaf_offset values have stayed the same. This means that the column was added ‘in place’, replacing the dropped column. This is a metadata only operation that did not modify any record in the table, it simply changed how the data in the existing records is interpreted.

But now we figure out the 75 characters length is wrong and we want to change it back to 100. Also, the birth_date column probably doesn’t need hours, so we can change it to a date type:

alter table users alter column birth_date date not null;
alter table users alter column first_name char(100);
go


The birth_date column has changed type and now it requires only 3 bytes, but the change occurred in-place, just as the nullability change we did before: it remains at the same offset in the record and it has the same null bit. However, the first_name column was moved from offset 8 to offset 211, and the null bit was changed from 4 to 5. Because the first_name column has increased in size it cannot occupy the same space as before in the record and the record has effectively increased to accommodate the new first_name column. This happened despite the fact that the first_name column was originally of size 100 so in theory it could reclaim the old space it used in the record, but the is simply a too corner case for the storage engine to consider.

By now we figure that the fixed length 100 for the first_name and last_name columns was a poor choice, so we would like to change them to more appropriate variable length columns:

alter table users alter column first_name varchar(100);
alter table users alter column last_name varchar(100);
go


The type change from fixed length column to variable length column cannot be done in-place, so the first_name and last_name columns get new null bit values.They also have negative leaf_offset values, which is typical for variable length columns as they don’t occupy fixed positions in the record.

Next lets change the length of the variable columns:

alter table users alter column first_name varchar(250);
alter table users alter column last_name varchar(250);
go

For this change the column length was modified without dropping the column and adding a new one. An increase of size for a variable length columns is one of the operations that is really altering the physical column, not dropping the column and adding a new one to replace it. However, a decrease in size, or a nullability change, does again drop and add the column, as we can quickly check now:

alter table users alter column first_name varchar(100);
alter table users alter column last_name varchar(250) not null;
go


Finally, lets say we tried to add two more fixed length columns, but we we undecided on the name and length so we added a couple of columns, then deleted them and added again a new one:

alter table users add mid_name char(50);
alter table users add surname char(25);
alter table users drop column mid_name;
alter table users drop column surname;

alter table users add middle_name char(100);
go


This case is interesting because it shows how adding a new fixed column can reuse the space left in the row by dropped fixed columns, but only if the dropped columns are the last fixed columns. In our table the columns mid_name and surname where originally added at offset 211 and 261 respectively and their length added up to 75 bytes. After we dropped them, the middle_name column we added is placed at offset 211, thus reusing the space formerly occupied by the dropped columns. This happens even though the length of the newly added column is 100 bytes, bigger than the 75 bytes occupied by the dropped columns before.

By now, our 5 column table has actually 15 columns in the physical storage format, 10 dropped columns and 5 usable columns. The table uses 412 bytes of fixed space for the 3 fixed length columns that have a total length of only 112 bytes. The variable length columns that now store the first_name and last_name are stored in the record after these 412 reserved bytes for fixed columns that are now dropped. Since the records always consume all the reserved fixed size, this is quite wasteful. How do we reclaim it? Rebuild the table:

alter table users rebuild;
go


As you can see the rebuild operation got rid off the dropped columns and now the physical storage layout is compact, aligned with the logical layout. So whenever doing changes to a table structure remember that at the storage layer most changes are cumulative, they are most times implemented by dropping a column and adding a new column back, with the new type/length/nullability. Whenever possible the a modified column reuses the space in the record and newly added columns may reuse space previously used by dropped columns.

Partitioned Tables

When partitioning is taken into account another dimension of the physical table structure is revealed. To start, lets consider a typical partitioned table:

create partition function pf (date) as range for values ('20111001');
go

create partition scheme ps as partition pf all to ([PRIMARY]);
go

create table sales (
	sale_date date not null,
	item_id int not null,
	store_id int not null,
	price money not null)
	on ps(sale_date);
go


As we know, the partitioned tables are, from a physical point of view, a collection of rowsets that belong to the same logical object (the 'table'). Looking under the hood shows that our table has two rowsets, and they have identical column structure. Typically new partitions are added by the ETL process that uploads data into a staging table that gets switched in using a fast partition switch operation:

create table sales_staging (
	sale_date date not null,
	item_id int not null,
	store_id int not null,
	price numeric(10,2) not null,
	constraint check_partition_range
		check (sale_date = '20111002'))
	on [PRIMARY];

alter partition scheme ps next used [PRIMARY];
alter partition function pf() split range ('20111002');
go

alter table sales_staging switch to sales partition $PARTITION.PF('20111002');
go
Msg 4944, Level 16, State 1, Line 2

ALTER TABLE SWITCH statement failed because column 'price' has data type numeric(10,2) in source table 'test.dbo.sales_staging' which is different from its type money in target table 'test.dbo.sales'.

OK, so we made a mistake in the staging table, so lets correct it, then try to switch it in again:

alter table sales_staging alter column price money not null;
alter table sales_staging switch to sales partition $PARTITION.PF('20111002');
go


We see that the partition switch operation has switched in the rowset of the staging table into the second partition of the sales table. But since the staging table had an operation that modified a column type, which in effect has dropped the numeric price columns and added a new price column of the appropriate money type, the second rowset of the partitioned table now has 5 columns, including a dropped one. The partition switch operation brings into the partitioned table the staging table as is, dropped columns and all. What that means is that partitions of the partitioned table can have, under the hood, a completely different physical structure from one another. Normally this should be of little concern and just a courisity to woe the newbies at DBA cocktail parties, but this problem has a darker side, known as KB2504090:

This issue occurs because the accessor that SQL Server uses to insert data into different partitions recognizes the metadata changes incorrectly. When data is inserted into the new partition that is created after a column is dropped, the number of the maximum nullable columns in the new partition may be one less than the number of the maximum nullable columns in the old partition.

If you ever find your partitioned tables not to have an uniform internal structure across all partitions, I recommend you rebuild the partition that is different. This simple query can be used to look at the number of physical columns in each partition of a table:

select count(*) as count_columns,
	index_id,
	partition_number
from sys.system_internals_partitions p
join sys.system_internals_partition_columns pc
	on p.partition_id = pc.partition_id
where p.object_id = object_id('sales')
group by index_id, partition_number;
go


This query shows that partitions 1 and 3 have 4 physical columns, while partition 2 has 5. We can rebuild partition 2:

alter table sales rebuild partition = 2;
go

With this operation we have rebuild the partition number 2 from scratch and removed all dropped columns in the process.

Note that my query above would only detect differences in the number of physical columns, but two partitions can still have a very different physical layout even if they have the same number of physical columns, eg. one can have physical column number 9 dropped and physical column number 10 used, while the other can have the opposite. Use your judgement when looking at the internal physical column layout to understand if they are truly the same.

T-SQL functions do no imply a certain order of execution

August 10th, 2011

Looking at this question on StackOverflow: Conversion failed when converting from a character string to uniqueidentifier error in SQL Server one can see a reproducible example where a string split UDF works fine in the SELECT statement, but it gives a conversion error in the DELETE statement. Certainly, a bug, right? Actually, not.

The issue at hand is in fact remarkably similar to another common misconception around T-SQL I had to debunk some time ago, see On SQL Server boolean operator short-circuit: that C like operator short-circuit is guaranteed in T-SQL (hint: it isn’t, read the linked article to see a clear counter example).

In the StackOverlow post the misconception is that order of declaration implies order of execution, that the function is evaluated somehow separately from the rest of the query and some sort of temporary result is created that is then used in the overall query execution. This understanding comes naturally to the imperative procedural language mindset of developers trained in C, C++, C# and other similar languages. But in the SQL Server declarative language that is T-SQL, your intuition is actually wrong. To illustrate I will give a simple counter-example, reusing the code from my earlier boolean short-circuit article:


create table eav (
    eav_id int identity(1,1) primary key,
    attribute varchar(50) not null,
    is_numeric bit not null,
    [value] sql_variant null);
create index eav_attribute on eav(attribute) include ([value]);
go

-- Fill the table with random values
set nocount on
declare @i int;
select @i = 0;
begin tran
while @i < 100000
begin
    declare @attribute varchar(50),
        @is_numeric bit,
        @value sql_variant;
    select @attribute = 'A' + cast(cast(rand()*1000 as  int) as varchar(3));
    select @is_numeric = case when rand() > 0.5 then 1 else 0 end;
    if 1=@is_numeric
        select @value = cast(rand() * 100 as int);
    else
        select @value = 'Lorem ipsum';
    insert into eav (attribute, is_numeric, [value])
    values (@attribute, @is_numeric, @value);
    select @i = @i+1;
    if (@i % 1000 = 0)
    begin
		commit;
		raiserror(N'Inserted %d', 0,0,@i);
		begin tran;
    end
end
commit
go

-- insert a 'trap'
insert into eav (attribute, is_numeric, [value])
values ('B1', 0, 'Gotch ya');
go

Now we’re going to define our inline table UDF:


create function udf_get_only_numerics()
returns table
with schemabinding
as return
	select [value], [attribute]
	from dbo.eav
	where is_numeric = 1;
go

And now lets innocently query our UDF:


select [value]
	from dbo.udf_get_only_numerics ()
    where attribute = 'B1'
    and cast([value] as int) > 50
go

Since we’re selecting from the UDF, we’re guaranteed that we’re only going to see values that have the is_numeric column value 1, so the cast([value] as int) must always succeed, right? Wrong, we’ll get a conversion exception:


Msg 245, Level 16, State 1, Line 3
Conversion failed when converting the varchar value 'Gotch ya' to data type int.

So what’s going on? Lets insert a known value and inspect the execution plan:


insert into eav (attribute, is_numeric, [value])
values ('B2', 1, 65);
go

select [value]
	from dbo.udf_get_only_numerics ()
    where attribute = 'B2'
    and cast([value] as int) > 50
go

Looking at the plan we see that, just as in my boolean short-circuit example, the presence of an index on attribute that includes the value column but not the is_numeric was too good opportunity to pass for the optimizer. And this plan will evaluate the cast([value]as int) expression before it evaluate the WHERE clause inside the UDF. That’s right, the WHERE clause of the UDF has moved in the execution plan above the WHERE clause of the query using the UDF. The naive assumption that the function definition posses some sort of barrier for execution was proven wrong.

So how does this explains the failure seen in the StackOverflow question? The UDF, which was originally posted on Tally OH! An Improved SQL 8K “CSV Splitter” Function article on sqlservercentral.com, contains WHERE clauses to that in effect restricts the output to only complete tokens. w/o the WHERE clause it would return non-complete token. We can easily test this, lets remove the WHERE clause and use the same input as the test in the StackOverflow question:


CREATE FUNCTION dbo.DelimitedSplit8K_noWhere
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
     -- enough to cover VARCHAR(8000)
 WITH E1(N) AS (
   SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
   SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
   SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (
                 SELECT 0 UNION ALL
                 SELECT TOP (DATALENGTH(ISNULL(@pString,1)))
                   ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (
                 SELECT t.N+1
                   FROM cteTally t
                )
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
        Item  = SUBSTRING(@pString,s.N1,
                ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))
   FROM cteStart s;

When we run this the output contains not only the two token in the input, but also every substring of these tokens:


-- Test string that will be split into table in the DelimitedSplit8k function
declare @temp varchar(max) =
     '918E809E-EA7A-44B5-B230-776C42594D91,6F8DBB54-5159-4C22-9B0A-7842464360A5'
select Item from dbo.DelimitedSplit8K_nowhere(@temp, ',');
go
Item
-----------------------------------------
918E809E-EA7A-44B5-B230-776C42594D91
18E809E-EA7A-44B5-B230-776C42594D91
8E809E-EA7A-44B5-B230-776C42594D91
E809E-EA7A-44B5-B230-776C42594D91
809E-EA7A-44B5-B230-776C42594D91
09E-EA7A-44B5-B230-776C42594D91
9E-EA7A-44B5-B230-776C42594D91
E-EA7A-44B5-B230-776C42594D91
-EA7A-44B5-B230-776C42594D91
...

By now the explanation to the issue seen in the StackExchange question should be obvious: just as in my example, the WHERE clause was pulled out from the function and placed in the generated plan above the cast(Item as uniqueidentifier) expression. During execution SQL Server is asked to convert the string ’18E809E-EA7A-44B5-B230-776C42594D91′ to a uniqueidentifier and, understandably, it fails.

Do no make assumptions about order of executions when using inline table UDFs. Q.E.D.

Online non-NULL with values column add in SQL Server 2012

July 13th, 2011

Prior to SQL Server 2012 when you add a new non-NULLable column with default values to an existing table a size-of data operation occurs: every row in the table is updated to add the default value of the new column. For small tables this is insignificant, but for large tables this can be so problematic as to completely prohibit the operation. But starting with SQL Server 2012 the operation is, in most cases, instantaneous: only the table metadata is changed, no rows are being updated.

Lets look at a simple example, we’ll create a table with some rows and then add a non-NULL column with default values. First create and populate the table:

create table test (
	id int not null identity(1,1) primary key,
	someValue int not null);
go

set nocount on;
insert into test (someValue) values (rand()*1000);
go 1000

We can inspect the physical structure of the table’s records using DBCC PAGE. First lets find the page that contains the first record of the table:

select %%physloc%%, * from test where id = 1;

In my case this returned 0xD900000001000000, which means slot 0 on page 0xD9 (aka. 217) of file 1, and my test database has the DB_ID 6. Hence the parameters to DBCC PAGE

dbcc traceon (3604,-1)
dbcc page(6,1,217,3)

Page @0x0000000170D5E000

m_pageId = (1:217)                  m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 84    m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594043432960
Metadata: PartitionId = 72057594039042048                                Metadata: IndexId = 1
Metadata: ObjectId = 245575913      m_prevPage = (0:0)                  m_nextPage = (1:220)
pminlen = 12                        m_slotCnt = 476                     m_freeCnt = 4
m_freeData = 7236                   m_reservedCnt = 0                   m_lsn = (30:71:25)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 2135435720             DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 15

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000000AEBA060

0000000000000000:   10000c00 01000000 34020000 020000†††††††††††††........4......

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1
Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 564

Note the last LSN that updated the page (30:71:25) and the size of the record in slot 0 (15 bytes). Now lets add a non-NULL column with default values:

alter table test add otherValue int not null default 42 with values;

We can select from the table and see that the table was changed and the rows have value 42 for the newly added column:

select top(2) * from test;

id          someValue   otherValue
----------- ----------- -----------
1           564         42
2           387         42

Yet if we inspect again the page, we can see that is unchanged:

dbcc traceon (3604,-1)
dbcc page(6,1,217,3)

Page @0x0000000170D5E000

m_pageId = (1:217)                  m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 84    m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594043432960
Metadata: PartitionId = 72057594039042048                                Metadata: IndexId = 1
Metadata: ObjectId = 245575913      m_prevPage = (0:0)                  m_nextPage = (1:220)
pminlen = 12                        m_slotCnt = 476                     m_freeCnt = 4
m_freeData = 7236                   m_reservedCnt = 0                   m_lsn = (30:71:25)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 2135435720             DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 15
Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000000E83A060
0000000000000000:   10000c00 01000000 34020000 020000†††††††††††††........4......

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1                              

Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 564                     

Slot 0 Column 3 Offset 0x0 Length 4 Length (physical) 0
otherValue = 42 

The page header is unchanged, the last LSN is still (30:71:25), proof that the page was not modified, and the physical record is unchanged and has the same size as before. Yet DBCC shows a Column 3 and its value 42! If you pay attention you’ll notice that the Column 3 though has an Offset 0×0 and a physical length of 0. Column 3 is somehow materialized out of thin air, as it does not physically exists in the record on this page. The ‘magic’ is that the table metadata has changed and it now contains a column with a ‘default’ value:

select pc.* from sys.system_internals_partitions p
	join sys.system_internals_partition_columns pc on p.partition_id = pc.partition_id
	where p.object_id = object_id('test');

Notice that sys.system_internals_partition_columns now has two new columns that are SQL Server 2012 specific: has_default and default_value. The column we added to the test table (the third row in the image above) has a default with value 42. This is how SQL Server 2012 knows how to show a value for Column 3 for this record, even though is physically missing on the page. With this ‘magic’ in place the ALTER TABLE will no longer have to update every row in the table and the operation is fast, metadata-only, no matter the number of rows in the table. This new behavior occurs automatically, no special syntax or setting is required, the engine will simply do the right thing. There is no penalty from having a missing value in a row. The ‘missing’ value can be queried, updated, indexed, exactly as if the update during ALTER TABLE really occurred. There is no measurable performance penalty from having a default value.

What happens when we update a row? The ‘default’ value is pushed into the row, even if the column was not modified. Consider this update:

update test set someValue = 565 where id = 1;

Although we did not touch the otherValue column, the row now was modified and it contains the materialized value:

dbcc page(6,1,217,3)

...
m_freeData = 7240                   m_reservedCnt = 0                   m_lsn = (31:271:2)
...
Slot 0 Offset 0x1c35 Length 19

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 19

Memory Dump @0x000000000AB8BC35

0000000000000000:   10001000 01000000 35020000 2a000000 030000††††........5...*......

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1                              

Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 565                     

Slot 0 Column 3 Offset 0xc Length 4 Length (physical) 4
otherValue = 42         

KeyHashValue = (8194443284a0)
Slot 1 Offset 0x60 Length 15
Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000000AB8A060
0000000000000000:   10000c00 02000000 83010000 020000†††††††††††††........ƒ......

Slot 1 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 2                              

Slot 1 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 387                     

Slot 1 Column 3 Offset 0x0 Length 4 Length (physical) 0
otherValue = 42

Notice how the physical record has increased in size (19 bytes vs. 15), the record has the value 42 in it (the hex 2a000000) and the Column 3 now has a real offset and physical size. So the update has trully materialized the default value in the row image. I intentionally copied the output of DBCC PAGE for the next slot in the page, to show that the record with id=2 was unaffected, it continues to have a smaller size of 15 bytes and Column 3 has no physical length.

Default value vs. Default constraint

Is worth saying that the new SQL Server 2012 default column value is not the same as the default value constraint. The default value is captured when the ALTER TABLE statement is run and can never change. Only rows existing in the table at the time of running ALTER TABLE statement will have missing ‘default’ values. By contrast the default constraint can be dropped or modified and new rows inserted after the ALTER TABLE will always have a value present in row for the new column. Any REBUILD operation on the table (or on the clustered index) will materialize all the missing values as the rows are being copied from the old hobt to the new hobt. The new hobt columns (sys.system_internals_partition_columns) will loose the has_default and default_value attributes, in effect loosing any trace that this column was added online. A default constraint by contrast will be preserved as a table is rebuilt.

Restrictions

Not all data types and default values can be added online. BLOB values like varchar(max), nvarchar(max), varbinary(max) and XML cannot be added online (and frankly I see no valid data model that has a non-NULL BLOB with a default…). Types that cannot be converted to sql_variant cannot be added online, like hierarchy_id, geometry and geography or user CLR based UDTs. Default expressions that require a different value for each row, like NEWID or NEWSEQUENTIALID cannot be added online (the default expression has to be a runtime constant, not to be confused with a deterministic expression, see Conor vs. Runtime Constant Functions for more details). In the case when the newly added column increases the maximum possible row size over the 8060 bytes limit the column cannot be added online. And is an Enterprise Edition only feature. For all the cases above the behavior will revert to adding the column ‘offline’, by updating every row in the table during the ALTER TABLE statement, creating a size-of-data update. When such a situation occurs a new XEvent is fired, which contains the reason why a size-of-data update occurred: alter_table_update_data.

FILESTREAM MVC: Download and Upload images from SQL Server

February 6th, 2011

In a previous article I have shown how it is possible to use efficient streaming semantics when Download and Upload images from SQL Server via ASP.Net MVC. In this article I will go over an alternative approach that relies on the FILESTREAM column types introduced in SQL Server 2008.

What is FILESTREAM?

FILESTREAM storage is a new option available in SQL Server 2008 and later that allows for BLOB columns to be stored directly on the file system as individual files. As files, the data is accessible through the Win32 file access API like ReadFile and WriteFile. But at the same time the same data is available through the normal T-SQL operations like SELECT or UPDATE. Not only that, but the data is contained logically in the database so it will be contained in a database backup, it is subject to ordinary transaction commit and rollback behavior, it is searched by SQL Server FullText indexes and it follows the normal SQL Server security access rules: if you are granted SELECT permission on the table, then you can open the file to read. There are some restrictions, eg. a database with FILESTREAM cannot be mirrored. For a full list of restrictions and limitations, see Using FILESTREAM with Other SQL Server Features. Note that SQL Server Express edition does support FILESTREAM storage.

Another attribute of the FILESTREAM storage is the size limitation: normal BLOB column values have a maximum size of 2Gb. FILESTREAM columns are limited only by the volume size limit of the file system. 2Gb may seem like a large value, but consider that a media file like an HD movie stream can easily go up to a size of 5Gb.

Using FILESTREAM

One way to use FILESTREAM columns is to treat them as ordinary BLOB values and manipulate them through T-SQL. The one restriction in place is that the efficient partial update syntax for BLOBs is not supported. That is, one cannot issue UPDATE table SET column.WRITE(...) WHERE ... on a FILESTREAM column. But where FILESTREAM storage begins to shine is when accessed through the system file API. This allows the application to efficiently read, write and seek in a large BLOB value, just as it would in a file. In fact, the application does read, write and seek in a file :) .

Native Win32 applications use a new API function OpenSqlFilestream that opens a HANDLE that can be then used with the file IO API functions. Managed applications use the new SqlFileStream class that exposes a Stream based on the underlying FILESTREAM value. Both the native and the manged API require as input two special values, a PathName and a Transaction Context that must be obtained previously from the SQL Server using T-SQL.

The requirement to provide a Transaction Context when manipulating a FILESTREAM value through the file IO API highlights another aspect of working with this new type: a T-SQL transaction must be started and must be kept open while the file is manipulated, and then it must be committed. If you think about it such a requirement is to be expected, since we said that FILESTREAM columns are subject to normal T-SQL transaction commit and rollback semantics, including when they are manipulated through the Windows file read/write API.

FILESTREAM based images table

In order to have FILESTREAM column, we must have a special filegroup in our database, a FILESTREAM filegroup. You can either add a new filegroup to your existing database, or you can create the database with a FILESTREAM filegroup from scratch. Also the FILESTREAM feature must be enabled on the SQL Server instance. For all the details, see Getting Started with FILESTREAM Storage. For my project, I’m simply going to create a new database with a FILESTREAM filegroup:


create database images
	on (name='images_data', filename='c:\temp\images.mdf')
	, filegroup FS contains FILESTREAM
              (name = 'images_files', filename='c:\temp\images_files')
	log on (name='images_log', filename='c:\temp\images.ldf');
go

The media table used by our MVC project is going to be similar to the one used in the previous article but the content column will have the FILESTREAM attribute added. A table that has FILESTREAM columns is required to have a ROWGUIDCOL column, so we’re going to add one of those as well:


create table media (
	[media_id] int not null identity(1,1),
	[file_name] varchar(256),
	[content_type] varchar(256),
	[content_coding] varchar(256),
	[media_rowguid] uniqueidentifier not null
               ROWGUIDCOL UNIQUE default newsequentialid() ,
	[content] varbinary(max) filestream,
        constraint pk_media_id primary key([media_id]),
	constraint unique_file_name unique ([file_name]));
go

FILESTREAM based IMediaRepository

If you haven’t read the previous article Download and Upload images from SQL Server via ASP.Net MVC yet, now is a good time to do it. I am going to reuse the same code and simply provide a new implementation for the IMediaRepository interface, an implementation that works with FILESTREAM storage:


    /// <summary>
    /// SQL Server FILESTREAM based implementation of the IMediaRepository
    /// </summary>
    public class FileStreamMediaRepository: IMediaRepository
    {
        /// <summary>
        /// Gets an open connection to the SQL Server back end
        /// </summary>
        /// <returns>the SqlConneciton object, ready to use</returns>
        private SqlConnection GetConnection()
        {
            SqlConnectionStringBuilder scsb = new SqlConnectionStringBuilder(
                ConfigurationManager.ConnectionStrings["Images"].ConnectionString);
            scsb.Pooling = true;
            scsb.AsynchronousProcessing = true;
            SqlConnection conn = new SqlConnection(scsb.ConnectionString);
            conn.Open();
            return conn;
        }

        /// <summary>
        /// Gets a file from the SQL repository
        /// </summary>
        /// <param name="fileName">filename to retrieve</param>
        /// <param name="file">Output, the model for the file if found</param>
        /// <returns>True if the file is found, False if not</returns>
        public bool GetFileByName(string fileName, out FileDownloadModel file)
        {
            SqlConnection conn = GetConnection();
            SqlTransaction trn = conn.BeginTransaction();
            try
            {
                SqlCommand cmd = new SqlCommand(
                    @"SELECT file_name,
	                    content_type,
                        content_coding,
                        DATALENGTH (content) as content_length,
	                content.PathName() as path,
                        GET_FILESTREAM_TRANSACTION_CONTEXT ()
                    FROM media
                    WHERE file_name = @fileName;", conn, trn);
                SqlParameter paramFilename = new SqlParameter(
                            @"fileName", SqlDbType.VarChar, 256);
                paramFilename.Value = fileName;
                cmd.Parameters.Add(paramFilename);
                using (SqlDataReader reader = cmd.ExecuteReader())
                {
                    if (false == reader.Read())
                    {
                        reader.Close();
                        trn.Dispose();
                        conn.Dispose();
                        trn = null;
                        conn = null;
                        file = null;
                        return false;
                    }

                    string contentDisposition = reader.GetString(0);
                    string contentType = reader.GetString(1);
                    string contentCoding = reader.IsDBNull(2) ? null : reader.GetString(2);
                    long contentLength = reader.GetInt64(3);
                    string path = reader.GetString(4);
                    byte[] context = reader.GetSqlBytes(5).Buffer;

                    file = new FileDownloadModel
                    {
                        FileName = contentDisposition,
                        ContentCoding = contentCoding,
                        ContentType = contentType,
                        ContentLength = contentLength,
                        Content = new MvcResultSqlFileStream
                        {
                            SqlStream = new SqlFileStream(path, context, FileAccess.Read),
                            Connection = conn,
                            Transaction = trn
                        }
                    };
                    conn = null; // ownership transfered to the stream
                    trn = null;
                    return true;
                }
            }
            finally
            {
                if (null != trn)
                {
                    trn.Dispose();
                }
                if (null != conn)
                {
                    conn.Dispose();
                }
            }
        }

        /// <summary>
        /// Adds a file to the SQL repository
        /// </summary>
        /// <param name="file">POST-ed file to be added</param>
        /// <param name="fileName">The filename part of the uploaded file path</param>
        public void PostFile(HttpPostedFileBase file, out string fileName)
        {
            fileName = Path.GetFileName(file.FileName);

            using (SqlConnection conn = GetConnection ())
            {
                using (SqlTransaction trn = conn.BeginTransaction ())
                {
                    SqlCommand cmdInsert = new SqlCommand(
@"insert into media
    (file_name, content_type, content_coding, content)
output
	INSERTED.content.PathName(),
    GET_FILESTREAM_TRANSACTION_CONTEXT ()
values
    (@content_disposition, @content_type, @content_coding, 0x)", conn, trn);
                    cmdInsert.Parameters.Add("@content_disposition", SqlDbType.VarChar, 256);
                    cmdInsert.Parameters["@content_disposition"].Value = fileName;
                    cmdInsert.Parameters.Add("@content_type", SqlDbType.VarChar, 256);
                    cmdInsert.Parameters["@content_type"].Value = file.ContentType;
                    cmdInsert.Parameters.Add("@content_coding", SqlDbType.VarChar, 256);
                    cmdInsert.Parameters["@content_coding"].Value = DBNull.Value;

                    string path = null;
                    byte[] context = null;

                    // cmdInsert is an INSERT command that uses the OUTPUT clause
                    // Thus we use the ExecuteReader to get the
                    // result set from the output columns
                    //
                    using (SqlDataReader rdr = cmdInsert.ExecuteReader())
                    {
                        rdr.Read();
                        path = rdr.GetString(0);
                        context = rdr.GetSqlBytes(1).Buffer;
                    }

                    using (SqlFileStream sfs = new SqlFileStream(
                                     path, context, FileAccess.Write))
                    {
                        file.InputStream.CopyTo(sfs);
                    }
                    trn.Commit ();
                }
            }
        }
    }

The implementation is pretty straight forward. The PostFile methods starts a transaction, inserts a row in the images table and obtains the PahtName and FILESTREAM transaction context to the newly inserted row, and then opens a SqlFileStream and copies in the uploaded file directly into the file that backs the FILESTREAM column. The GetFileByName method obtains the PathName and FILESTREAM transaction context of the desired file and then returns a SqlFileStream that accesses directly the file in which the FILESTREAM value is stored.

The only thing requiring explanation is the MvcResultSqlFileStream class. Because the SqlFileStream uses a transaction context, it is required to keep the SqlConnection and the SqlTransaction open until the SqlFileStream finishes accessing the content. In the PostFile method this requirement is achieved easily because the entire usage of the SqlFileStream complete within the stack frame, but the GetFileByName method has to return a Stream that in turn is passed into a FileStreamResult by the MediaController class and there is no control over how the MVC framework will use this returned ActionResult. My solution is to wrap the SqlFileStream in a new Stream derived class (since SqlFileStream is sealed, it cannot be inherited...) and then handle the resource management in this derived class' Dispose:


    /// <summary>
    /// Implementation of System.IO.Stream based on a SqlFileStream
    /// Disposes the connection, transaction and SqlFileStream objects
    /// </summary>
    public class MvcResultSqlFileStream: Stream
    {
        public SqlFileStream SqlStream { get; set; }
        public SqlConnection Connection { get; set; }
        public SqlTransaction Transaction { get; set; }
        public override bool CanRead
        {
            get { return SqlStream.CanRead; }
        }

        public override bool CanSeek
        {
            get { return SqlStream.CanSeek; }
        }

        public override bool CanWrite
        {
            get { return SqlStream.CanWrite; }
        }

        public override void Flush()
        {
            SqlStream.Flush ();
        }

        public override long Length
        {
            get { return SqlStream.Length; }
        }

        public override long Position
        {
            get
            {
                return SqlStream.Position;
            }
            set
            {
                SqlStream.Position = value;
            }
        }

        public override int Read(byte[] buffer, int offset, int count)
        {
            return SqlStream.Read (buffer, offset, count);
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            return SqlStream.Seek(offset, origin);
        }

        public override void SetLength(long value)
        {
            SqlStream.SetLength(value);
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            SqlStream.Write(buffer, offset, count);
        }

        protected override void Dispose(bool disposing)
        {
            if (disposing)
            {
                SqlStream.Dispose();
                Transaction.Dispose();
                Connection.Dispose();
            }
            base.Dispose(disposing);
        }
    }

This extra class feels rather unnecessary, but unfortunately is required because the MVC framework has no proper handling of resources passed out from the controller. If you ask me ActionResult should implement IDisposable, but then of course nobody asks me ;)

You can download the source for the entire MVC project from here.

Conclusion

Handling media content stored in the database using FILESTREAM storage makes for a nicer developer experience. The FILESTREAM content can be accessed using streaming semantics based on the SqlFileStream class, or using the Win32 file IO API for native applications. By comparison the previous article that relied on straight T-SQL to manipulate the BLOB values had to do some very unintuitive tricks in order to achieve streaming semantics, specially for uploading a file. FILESTREAM storage simplifies development and is supported in every SQL Server edition, including Express.

If you intend to deploy FILESTREAM in your environment I recommend going over some Knowledge Base and white paper articles first:

How to pass a NULL value in a message to a queue in SQL Server

January 15th, 2011

The SEND Transact-SQL verb does not allow to send a NULL message body, attempting to do so will result in error:


Msg 8433, Level 16, State 1, Line 11
The message body may not be NULL.  A zero-length UNICODE or binary string is allowed.

But there are ways to send a NULL message body. One way is to completely omit the message body argument:


SEND ON CONVERSATION @handle MESSAGE TYPE [...];

Another way is to send a 0 length message body, which will be enqueued as a NULL message body in the target queue:


SEND ON CONVERSATION @handle MESSAGE TYPE [...] (0x);
SEND ON CONVERSATION @handle MESSAGE TYPE [...] ('');
SEND ON CONVERSATION @handle MESSAGE TYPE [...] (N'');

All three forms above will enqueue the same message body: NULL. This is true for both binary messages (VALIDATION = NONE) and for XML messages (VALIDATION=WELL_FORMED_XML).

Here is a short test script showing this:


create message type [BINARY] validation = none;
create message type [XML] validation = well_formed_xml;
go

create contract [TEST] (
	[BINARY] sent by initiator,
	[XML] sent by initiator);
go

create queue Sender;
create service Sender on queue Sender;
go

create queue Receiver;
create service Receiver on queue Receiver  ([TEST]);
go

declare @h uniqueidentifier;

begin dialog conversation @h
from service [Sender]
to service N'Receiver', N'current database'
on contract [TEST]
with encryption = off;

send on conversation @h message type [BINARY];
send on conversation @h message type [BINARY] (0x);

send on conversation @h message type [XML];
send on conversation @h message type [XML] ('');
send on conversation @h message type [XML] (N'');

receive * from [Receiver];
go

The received message_body column have a NULL value for all 5 messages sent.

Download and Upload images from SQL Server via ASP.Net MVC

December 28th, 2010

A frequent question that pops up on discussion forums is how to serve an image that is stored in a SQL Server table from an ASP.Net application. Unfortunately the answer is almost always wrong, as the prevalent solution involves copying the entire image file into memory before returning it to the client. This solution works fine when tested with a light load and returning few small images. But in production environment the memory consumption required by all those image files stored as byte arrays in memory causes serious performance degradation. A good solution must use streaming semantics, transferring the data in small chunks from the SQL Server to the HTTP returned result.

The SqlClient components do offer streaming semantics for large result sets, including large BLOB fields, but the client has to specifically ask for it. The ‘secret ingredient’ is the passing in the CommandBehavior.SequentialAccess flag to the SqlCommand.ExecuteReader:

Provides a way for the DataReader to handle rows that contain columns with large binary values. Rather than loading the entire row, SequentialAccess enables the DataReader to load data as a stream. You can then use the GetBytes or GetChars method to specify a byte location to start the read operation, and a limited buffer size for the data being returned.

An ASP.Net MVC virtual Media folder backed by SQL Server

Lets say we want to have a virtual Media folder in an ASP.Net MVC site, serving the files from a SQL Server database. A GET request for an URL like "http://site/Media/IMG0042.JPG" should return the content of the file named IMG0042.JPG from the database. A POST request to the URL "http://site/Media" which contains an embedded file should insert this new file in the database and redirect the response to the newly added file virtual path. This is how our upload HTML form looks like:

<form method="post" action="/Media" enctype="multipart/form-data">
<input type="file" name="file" id="file"/>
<input type="submit" name="Submit" value="Submit"/>
</form>

We’ll start by adding a special route in Global.asax.cs that will act like a virtual folder for file download requests:

            routes.MapRoute(
                "Media",
                "Media/{filename}",
                new { controller = "Media", action = "GetFile" },
                new { filename = @"[^/?*:;{}\\]+" });

Note that the upload will be handled by the default MVC route if we add an Index() method to our controller that will handle the POST.

For our GET requests we need a FileDownloadModel class to represent the requested file properties. for this sample, we don’t need a POST model since we’re only going to have have a single input field, the uploaded file. We’re going to use a Repository interface that declares two methods: GetFileByName returns from the repository a FileDownloadModel given a file name, and PutFile will accept an uploaded file and put it in the repository.

    public class FileDownloadModel
    {
        public string FileName {get; internal set;}
        public string ContentType { get; internal set; }
        public string ContentCoding { get; internal set; }
        public long ContentLength { get; internal set; }
        public Stream Content { get; internal set; }
    }

    public interface IMediaRepository
    {
        bool GetFileByName(
            string fileName,
            out FileDownloadModel file);

        void PostFile(
            HttpPostedFileBase file,
            out string fileName);
    }

With this Repository interface we can actually code our MediaController class:

    public class MediaController : Controller
    {
        public IMediaRepository Repository { get; set; }

        public MediaController()
        {
            Repository = new SqlMediaRepository();
        }

        [HttpPost]
        public ActionResult Index()
        {
            string fileName;
            Repository.PostFile(Request.Files[0], out fileName);
            return new RedirectResult("/Media/" + fileName);
        }

        [HttpGet]
        public ActionResult GetFile(string fileName)
        {
            FileDownloadModel model;
            if (false == Repository.GetFileByName(
                fileName,
                out model))
            {
                return new HttpNotFoundResult
                {
                    StatusDescription = String.Format(
                        "File {0} not found",
                        fileName)
                };
            }

            if (null != model.ContentCoding)
            {
                Response.AddHeader(
                    "Content-Encoding",
                    model.ContentCoding);
            }
            Response.AddHeader(
                "Content-Length",
                model.ContentLength.ToString ());

            Response.BufferOutput = false;

            return new FileStreamResult(
                model.Content,
                model.ContentType);
        }
    }

In have hard coded the Repository implementation to SqlMediaRepository, a class we’ll create shortly. A real project would probably use the Dependency Injection or Inversion of Control patterns, perhaps using Castle Windsor for example. For brevity, I will skip these details, there are plenty of blogs and articles describing how to do it.

Note the use of the FileStreamResult return, which is an MVC supplied action for returning a download of an arbitrary Stream object. Which also brings us to the next point, we need to implement a Stream that read content from a SqlDataReader.

A SqlDataReader based Stream

We now need an implementation of the abstract Stream class that can stream out a BLOB column from a SqlDataReader. Wait, you say, doesn’t SqlBytes already have a Stream property that reads a BLOB from a result as a Stream? Unfortunately this small remark makes this class useless for our purposes:

Getting or setting the Stream property loads all the data into memory. Using it with large value data can cause an OutOfMemoryException.

So we’re left with implementing a Stream based on a SqlDataReader BLOB field, a Stream that returns the content of the BLOB using the proper GetBytes calls and does not load the entire BLOB in memory. Fortunately this is fairly simple as we only need to implement a handful of methods:

    public class SqlReaderStream: Stream
    {
        private SqlDataReader reader;
        private int columnIndex;
        private long position;

        public SqlReaderStream(
            SqlDataReader reader,
            int columnIndex)
        {
            this.reader = reader;
            this.columnIndex = columnIndex;
        }

        public override long Position
        {
            get { return position; }
            set { throw new NotImplementedException(); }
        }

        public override int Read(byte[] buffer, int offset, int count)
        {
            long bytesRead = reader.GetBytes(columnIndex, position, buffer, offset, count);
            position += bytesRead;
            return (int)bytesRead;
        }

        public override bool CanRead
        {
            get { return true; }
        }

        public override bool CanSeek
        {
            get { return false; }
        }

        public override bool CanWrite
        {
            get { return false; }
        }

        public override void Flush()
        {
            throw new NotImplementedException();
        }

        public override long Length
        {
            get { throw new NotImplementedException(); }
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotImplementedException();
        }

        public override void SetLength(long value)
        {
            throw new NotImplementedException();
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            throw new NotImplementedException();
        }

        protected override void Dispose(bool disposing)
        {
            if (disposing && null != reader)
            {
                reader.Dispose();
                reader = null;
            }
            base.Dispose(disposing);
        }
    }

As you can see, we need only to return the proper responses to CanRead (yes), CanWrite (no) and CanSeek (also no), keep track of our current Position and we need implement Read to fetch more bytes from the reader, using GetReader.

We’re also overriding the Dispose(bool disposing) method. This is because we’re going to have to close the SqlDataReader when the content stream transfer is complete. If this is the first time you see this signature of the Dispose method, then you must read Implementing a Dispose Method

Streaming Upload of BLOB data

Just like retrieving large BLOB data from SQL Server poses challenges to avoid creating full blown in-memory copies of the entire BLOB, similar issues arrive when attempting to insert a BLOB. The best solution is actually quite convoluted. It involves sending the data to the server in chunks, and using the in-place BLOB UPDATE syntax. The MSDN has this to say in the Remarks section:

Use the .WRITE (expression, @Offset, @Length) clause to perform a partial or full update of varchar(max), nvarchar(max), and varbinary(max) data types. For example, a partial update of a varchar(max) column might delete or modify only the first 200 characters of the column, whereas a full update would delete or modify all the data in the column. For best performance, we recommend that data be inserted or updated in chunk sizes that are multiples of 8040 bytes.

To implement such semantics, we will write a second Stream implementation, this time for uploads:

    public class SqlStreamUpload: Stream
    {
        public SqlCommand InsertCommand { get; set; }
        public SqlCommand UpdateCommand { get; set; }
        public SqlParameter InsertDataParam { get; set; }
        public SqlParameter UpdateDataParam { get; set; }

        public override bool CanRead
        {
            get { return false; }
        }

        public override bool CanSeek
        {
            get { return false; }
        }

        public override bool CanWrite
        {
            get { return true; }
        }

        public override void Flush()
        {
        }

        public override long Length
        {
            get { throw new NotImplementedException(); }
        }

        public override long Position
        {
            get; set;
        }

        public override int Read(byte[] buffer, int offset, int count)
        {
            throw new NotImplementedException();
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotImplementedException();
        }

        public override void SetLength(long value)
        {
            throw new NotImplementedException();
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            byte[] data = buffer;
            if (offset != 0 ||
                count != buffer.Length)
            {
                data = new byte[count];
                Array.Copy(buffer, offset, data, 0, count);
            }
            if (0 == Position &&
                null != InsertCommand)
            {
                InsertDataParam.Value = data;
                InsertCommand.ExecuteNonQuery();
            }
            else
            {
                UpdateDataParam.Value = data;
                UpdateCommand.ExecuteNonQuery();
            }
            Position += count;
        }
    }

This Stream implementation uses two SqlCommand objects: an InsertCommand to save the very first chunk, and an UpdateCommand to save the subsequent chunks. Note that the chunk sizes (the optimal 8040 bytes) is nowhere specified, that is easily achieved by wrapping the SqlStreamUpload in a BufferedStream instance.

The MEDIA table

create table media (
	[media_id] int not null identity(1,1),
	[file_name] varchar(256),
	[content_type] varchar(256),
	[content_coding] varchar(256),
	[content] varbinary(max),
        constraint pk_media_id primary key([media_id]),
	constraint unique_file_name unique ([file_name]));

This table contains the downloadable media files. Files are identified by name, so the names have an unique constraint. I’ve added a IDENTITY primary key, because in a CMS these files are often references from other parts of the application and and INT is a shorter reference key than a file name. The content_type field is needed to know the type of file: “image/png”, “image/jpg” etc (see the officially registered IANA Media types. The content_coding field is needed if the files are stored compressed and a Content-Encoding HTTP header with the tag “gzip” or “deflate” has to be added to the response. Note that most image types (JPG, PNG) don’t compress well, as these file formats already include a compression algorithm and when compressed again with the common HTTP transfer algorithms (gzip, deflate, compress) they usually increase in size.

The SqlMediaRepository

The final piece of the puzzle: an implementation of the IMediaRepository interface that uses a SQL Server back end for the files storage:

    public class SqlMediaRepository: IMediaRepository
    {
        private SqlConnection GetConnection()
        {
            SqlConnectionStringBuilder scsb = new SqlConnectionStringBuilder(
                ConfigurationManager.ConnectionStrings["Images"].ConnectionString);
            scsb.Pooling = true;
            SqlConnection conn = new SqlConnection(scsb.ConnectionString);
            conn.Open();
            return conn;
        }

        public void PostFile(
            HttpPostedFileBase file,
            out string fileName)
        {
            fileName = Path.GetFileName(file.FileName);

            using (SqlConnection conn = GetConnection())
            {
                using (SqlTransaction trn = conn.BeginTransaction())
                {
                    SqlCommand cmdInsert = new SqlCommand(
                        @"INSERT INTO media (
                            file_name,
                            content_type,
                            content_coding,
                            content)
                        values (
                            @content_disposition,
                            @content_type,
                            @content_coding,
                            @data);", conn, trn);
                    cmdInsert.Parameters.Add("@data", SqlDbType.VarBinary, -1);
                    cmdInsert.Parameters.Add("@content_disposition", SqlDbType.VarChar, 256);
                    cmdInsert.Parameters["@content_disposition"].Value = fileName;
                    cmdInsert.Parameters.Add("@content_type", SqlDbType.VarChar, 256);
                    cmdInsert.Parameters["@content_type"].Value = file.ContentType;
                    cmdInsert.Parameters.Add("@content_coding", SqlDbType.VarChar, 256);
                    cmdInsert.Parameters["@content_coding"].Value = DBNull.Value;

                    SqlCommand cmdUpdate = new SqlCommand(
                            @"UPDATE media
                            SET content.write (@data, NULL, NULL)
                            WHERE file_name = @content_disposition;", conn, trn);
                    cmdUpdate.Parameters.Add("@data", SqlDbType.VarBinary, -1);
                    cmdUpdate.Parameters.Add("@content_disposition", SqlDbType.VarChar, 256);
                    cmdUpdate.Parameters["@content_disposition"].Value = fileName;

                    using (Stream uploadStream = new BufferedStream(
                        new SqlStreamUpload
                        {
                            InsertCommand = cmdInsert,
                            UpdateCommand = cmdUpdate,
                            InsertDataParam = cmdInsert.Parameters["@data"],
                            UpdateDataParam = cmdUpdate.Parameters["@data"]
                        }, 8040))
                    {
                        file.InputStream.CopyTo(uploadStream);
                    }
                    trn.Commit();
                }
            }
        }

        public bool GetFileByName(string fileName, out FileDownloadModel file)
        {
            SqlConnection conn = GetConnection();
            try
            {
                SqlCommand cmd = new SqlCommand(
                    @"SELECT file_name,
	                    content_type,
                        content_coding,
                        DATALENGTH (content) as content_length,
	                    content
                    FROM media
                    WHERE file_name = @fileName;", conn);
                SqlParameter paramFilename = new SqlParameter(
                      @"fileName", SqlDbType.VarChar, 256);
                paramFilename.Value = fileName;
                cmd.Parameters.Add(paramFilename);
                SqlDataReader reader = cmd.ExecuteReader(
                    CommandBehavior.SequentialAccess |
                    CommandBehavior.SingleResult |
                    CommandBehavior.SingleRow |
                    CommandBehavior.CloseConnection);
                if (false == reader.Read())
                {
                    reader.Dispose();
                    conn = null;
                    file = null;
                    return false;
                }

                string contentDisposition = reader.GetString(0);
                string contentType = reader.GetString(1);
                string contentCoding = reader.IsDBNull(2) ? null : reader.GetString(2);
                long contentLength = reader.GetInt64(3);
                Stream content = new SqlReaderStream(reader, 4);

                file = new FileDownloadModel
                {
                    FileName = contentDisposition,
                    ContentCoding = contentCoding,
                    ContentType = contentType,
                    ContentLength = contentLength,
                    Content = content
                };
                conn = null; // ownership transfered to the reader/stream
                return true;
            }
            finally
            {
                if (null != conn)
                {
                    conn.Dispose();
                }
            }
        }
    }

Typically MVC applications tend to use a LINQ based Repository. In this case though I could not leverage LINQ because of the peculiar requirements of implementing efficient streaming for large BLOBs. So this Repository uses plain vanilla SqlClient code.

The GetFileByName method gets the one row from the media table and returns a FileDownloadModel with a SqlReaderStream object that is wrapping the SELECT command result. Note that I could not deploy the typical “using” pattern for the disposable SqlConnection because the connection must remain open until the command result SqlDataReader finishes streaming in the BLOB, and the streaming will be controlled by the MVC framework executing our ActionResult after the GetFileByName is completed. The connection will be closed when the SqlReaderStream is disposed, because of the CommandBehavior.CloseConnection flag. The stream will be disposed by the FileStreamResult.WriteFile method.

The PostFile method creates two SqlCommand statements, one to insert the first chunk of data along with the other relevant columns, and an update command that uses BLOB partial updates to write the subsequent chunks. A SqlStreamUpload object is then using these two SqlCommand to efficiently stream in the uploaded file. The intermediate BufferedStream is used to create upload chunks of the critical size of 8040 bytes (see above). If the content would have to be compressed, this is where it would happen, a GZipStream would be placed in front of the Bufferedstream in order to compress the uploaded file, and an the “@content_coding” parameter would have to be set as to “gzip”.

HTTP caching

I have left out from this implementation appropriate HTTP caching control. HTTP caching is extremely important in high volume traffic sites, there is no better way to optimize an HTTP request processing that to never receive said request at all and have the user-agent cache or some intermediate proxy serve a response instead. Our application would have to add appropriate ETag and Cache-Control HTTP headers and the MediaController would need to have an action for the HEAD requests. The IMediaRepository interface would need a new method to get all the file properties without the actual content. For the moment, I’ll leave these as an exercise for the reader…

To BLOB or Not to BLOB

This article does not try to address the fundamental question whether you should store the images in the database to start with. Arguments can be made both for and against this. Russel Sears, Catherine van Ingen and Jim Gray have published a research paper in 2006 on their performance comparison analysis between storing files in the database and storing them in the file system: To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?. They concluded that:

The study indicates that if objects are larger than one megabyte on average, NTFS has a clear advantage over SQL Server. If the objects are under 256 kilobytes, the database has a clear advantage. Inside this range, it depends on how write intensive the workload is, and the storage age of a typical replica in the system.

However their study was not comparing how to serve the files as HTTP responses. The fact that the web server can efficiently serve a file directly from the file system without any code being run in the web application changes the equation quite a bit, and tilts the performance strongly in favor of the file system. But if you already have considered the pros and cons and decided that the advantages of a consistent backup/restore and strong referential integrity warrants database stored BLOBs, then I hope this article highlights an efficient way to return those BLOBs as HTTP responses.

SqlDependency based caching of LINQ Queries

August 4th, 2010

Query Notifications is the SQL Server feature that allows a client to subscribe to notifications that are sent when data in the database changes irrelevant of how that change occurs. I have talked before about how Query Notifications works in the article The Mysterious Notification. This feature was designed specifically for client side cache invalidation: the applications runs a query, gets back the result and stores it in the cache. Whenever the result is changed because data was updated, the application will be notified and it can invalidate the cached result.

Leveraging Query Notifications from the managed clients is very easy due to the dedicated SqlDependency class that takes care of a lot of the details needed to be set up in place in order to be able to receive these notifications. But the MSDN examples and the general community know how with SqlDepenendency is geared toward straight forward usage, by attaching it to a SqlCommand object.

Leveraging SqlDependency from LINQ queries

There is no clear guidance from MSDN on how to mix these two technologies: Query Notifications and LINQ. There are a few in the community who have given hints on what has to be done, like this article Using SQLDependency objects with LINQ by Ryan Dunn (blog|twitter).

My goal is to propose an easy to use extension method that can add SqlDependency based caching to any IQueryable<T>. Usage should be as simple as:

var queryTags = from t in ctx.Tags select t;
var tags = queryTags.AsCached("Tags");
foreach (Tag t in tags)
{
  ...
}

The first invocation should run the query and return the result, setting up a SqlDependency notification and also caching the result. Subsequent invocations should return the cached result, without hitting the database. Any change to the Tags table in my example should trigger the SqlDependency and invalidate the cache. Next invocation would again run the query and return the updated result, setting up a new SqlDependency notification and caching the new result.

LinqToCache project

My solution is available as the LinqToCache project. To cache a LINQ query results and get active SqlDependency notifications when the data was changed, simply download the appropriate DLL for your target framework (.Net 3.5 or .Net 4.0) and add it as a reference to your project. Now any LINQ query (any IQueryable) will have a new extension method AsCached. This method returns an IEnumerable of the query result. First invocation will always hit the database and set up a SqlDependency, subsequent invocations will return the cached result as long as it was not invalidated.

Query Notifications restrictions

Not every query can be subscribed for notifications. The gory details of what works and what doesn’t are described in MSDN at Creating a Query for Notification:

  • The projected columns in the SELECT statement must be explicitly stated, and table names must be qualified with two-part names. Notice that this means that all tables referenced in the statement must be in the same database.
  • The statement may not use the asterisk (*) or table_name.* syntax to specify columns.
  • The statement may not use unnamed columns or duplicate column names.
  • The statement must reference a base table.
  • The statement must not reference tables with computed columns.
  • The projected columns in the SELECT statement may not contain aggregate expressions unless the statement uses a GROUP BY expression. When a GROUP BY expression is provided, the select list may contain the aggregate functions COUNT_BIG() or SUM(). However, SUM() may not be specified for a nullable column. The statement may not specify HAVING, CUBE, or ROLLUP.
  • A projected column in the SELECT statement that is used as a simple expression must not appear more than once.
  • The statement must not include PIVOT or UNPIVOT operators.
  • The statement must not include the UNION, INTERSECT, or EXCEPT operators.
  • The statement must not reference a view.
  • The statement must not contain any of the following: DISTINCT, COMPUTE or COMPUTE BY, or INTO.
  • The statement must not reference server global variables (@@variable_name).
  • The statement must not reference derived tables, temporary tables, or table variables.
  • The statement must not reference tables or views from other databases or servers.
  • The statement must not contain subqueries, outer joins, or self-joins.
  • The statement must not reference the large object types: text, ntext, and image.
  • The statement must not use the CONTAINS or FREETEXT full-text predicates.
  • The statement must not use rowset functions, including OPENROWSET and OPENQUERY.
  • The statement must not use any of the following aggregate functions: AVG, COUNT(*), MAX, MIN, STDEV, STDEVP, VAR, or VARP.
  • The statement must not use any nondeterministic functions, including ranking and windowing functions.
  • The statement must not contain user-defined aggregates.
  • The statement must not reference system tables or views, including catalog views and dynamic management views.
  • The statement must not include FOR BROWSE information.
  • The statement must not reference a queue.
  • The statement must not contain conditional statements that cannot change and cannot return results (for example, WHERE 1=0).
  • The statement can not specify READPAST locking hint.
  • The statement must not reference any Service Broker QUEUE.
  • The statement must not reference synonyms.
  • The statement must not have comparison or expression based on double/real data types.
  • The statement must not use the TOP expression.

Although this list of restrictions is pretty severe, there is still room left for plenty of useful queries than can be cached using SqlDependency notifications for invalidation.

Linq to SQL

Straight forward LINQ to SQL queries are valid for Query Notifications, as long as the first restriction listed above is cleared: table names must be qualified with two-part names. In practice, this means simply fully qualifying the table names in the context designer, or in the [Table] attribute on the class. that is, always use ‘dbo.Table’ instead of simply ‘Table’ (of course, replace ‘dbo’ with appropriate schema if necessary).

But there are a couple of conditions that are specially important for us: must not use the TOP expression and must not use … ranking and windowing functions.. These two restrictions mean the popular Skip() and Take() operators are not supported. Unfortunately, these are some of the most popular operators used with LINQ because they are the easiest way to implement paging of results.

LINQ to Entity Framework

My initial goal was to only support LINQ to SQL, given that the overwhelming majority of developers favor it over EF. But the implementation works with any IQueryable, so in theory it should just work with EF as well. Unfortunately, the way EF chooses to formulate the queries makes it incompatible with Query Notifications. Consider a simple Linq TO EF query like following:

var q = from p in ctx.Persons where p.FirstName == "Remus" select p;

This will generate the following SQL:

SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[LastName] AS [LastName]
FROM (SELECT
      [Persons].[PersonId] AS [PersonId],
      [Persons].[FirstName] AS [FirstName],
      [Persons].[LastName] AS [LastName]
      FROM [dbo].[Persons] AS [Persons]) AS [Extent1]
WHERE 'Remus' = [Extent1].[FirstName]

The gratuitous addition of a subquery violates the Query Notifications restrictions and the SqlDependency gets invalidated straight away with a Statement violation.

Download

The LinqToCache DLLs and source code are available at http://code.google.com/p/linqtocache/.