Show all index and heap access operators in the plan cache

January 27th, 2012

with xmlnamespaces (default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
select x.value(N'@NodeId',N'int') as NodeId
	, x.value(N'@PhysicalOp', N'sysname') as PhysicalOp
	, x.value(N'@LogicalOp', N'sysname') as LogicalOp
	, ox.value(N'@Database',N'sysname') as [Database]
	, ox.value(N'@Schema',N'sysname') as [Schema]
	, ox.value(N'@Table',N'sysname') as [Table]
	, ox.value(N'@Index',N'sysname') as [Index]
	, ox.value(N'@IndexKind',N'sysname') as [IndexKind]
	, x.value(N'@EstimateRows', N'float') as EstimateRows
	, x.value(N'@EstimateIO', N'float') as EstimateIO
	, x.value(N'@EstimateCPU', N'float') as EstimateCPU
	, x.value(N'@AvgRowSize', N'float') as AvgRowSize
	, x.value(N'@TableCardinality', N'float') as TableCardinality
	, x.value(N'@EstimatedTotalSubtreeCost', N'float') as EstimatedTotalSubtreeCost
	, x.value(N'@Parallel', N'tinyint') as DOP
	, x.value(N'@EstimateRebinds', N'float') as EstimateRebinds
	, x.value(N'@EstimateRewinds', N'float') as EstimateRewinds
	, st.*
	, pl.query_plan
from sys.dm_exec_query_stats as st
cross apply sys.dm_exec_query_plan (st.plan_handle) as pl
cross apply pl.query_plan.nodes('//RelOp[./*/Object/@Database]') as op(x)
cross apply op.x.nodes('./*/Object') as ob(ox)


I recently needed a query to look into the current query plan cache and locate all actual data access operators (index scans, index seeks, table scans). This is the query I used, I decided to place it here if someone else find it useful and, more importantly, so that I can find it again when I needed it…

SQL Server table columns under the hood

October 20th, 2011

You probably can easily answer a question like ‘What columns does this table have?’. Whether you use the SSMS object explorer, or sp_help, or you query sys.column, the answer is fairly easy to find. But what is I ask ‘What are the physical columns of this table?’. Huh? Is there any difference? Lets see.

At the logical layer tables have exactly the structure you declare it in your CREATE TABLE statement, and perhaps modifications from ALTER TABLE statements. This is the layer at which you can look into sys.columns and see the table structure, or look at the table in SSMS object explorer and so on and so forth. But there is also a lower layer, the physical layer of the storage engine where the table might have surprisingly different structure from what you expect.

Inspecting the physical table structure

To view the physical table structure you must use the undocumented system internals views: sys.system_internals_partitions and sys.system_internals_partition_columns:

select p.index_id, p.partition_number,
	pc.leaf_null_bit,
	coalesce(cx.name, c.name) as column_name,
	pc.partition_column_id,
	pc.max_inrow_length,
	pc.max_length,
	pc.key_ordinal,
	pc.leaf_offset,
	pc.is_nullable,
	pc.is_dropped,
	pc.is_uniqueifier,
	pc.is_sparse,
	pc.is_anti_matter
from sys.system_internals_partitions p
join sys.system_internals_partition_columns pc
	on p.partition_id = pc.partition_id
left join sys.index_columns ic
	on p.object_id = ic.object_id
	and ic.index_id = p.index_id
	and ic.index_column_id = pc.partition_column_id
left join sys.columns c
	on p.object_id = c.object_id
	and ic.column_id = c.column_id
left join sys.columns cx
	on p.object_id = cx.object_id
	and p.index_id in (0,1)
	and pc.partition_column_id = cx.column_id
where p.object_id = object_id('...')
order by index_id, partition_number;

Lets inspect some simple table structures:

create table users (
	user_id int not null identity(1,1),
	first_name varchar(100) null,
	last_name varchar(100) null,
	birth_date datetime null);

Running our query after, of course, we specify object_id('users'):

We can see that the physical structure is very much as we expected: the physical rowset has 4 physical columns, of the expected types and sizes. One thing to notice is that, although the column order is the one we specified, the columns are layout on disk in a different order: the user_id is stored in row at offset 4, followed by the birth_date at offset 8 and then by the two variable length columns (first_name and last_name) that have negative offsets, an indication that they reside in the variable size portion of the row. This should be of no surprise as we know that the row format places all fixed length columns first in the row, ahead of the variable length columns.

Adding a clustered index

Our table right now is a heap, we should make user_id a primary key, and lets check the table structure afterward:

alter table users add constraint pk_users_user_id primary key (user_id);


As we can see, the table has changed from a heap into a clustered index (index_id changed from 0 to 1) and the user_id column has become part of the key.

Non-Unique clustered indexes

Now lets say that we want a different clustered index, perhaps by birth_date (maybe our application requires frequent range scans on this field). We would change the primary key into a non-clustered primary key (remember, the primary key and the clustered index do not have to be the same key) and add a clustered index by the birth_date column:

alter table users drop constraint pk_users_user_id;
create clustered index cdx_users on users (birth_date);
alter table users add constraint
	pk_users_user_id primary key nonclustered  (user_id);


Several changes right there:

  • A new index has appeared (index_id 2), which was expected, this is the non-clustered index that now enforces the primary key constraint on column user_id.
  • The table has a new physical column, with partition_column_id 0. This new column has no name, because it is not visible in the logical table representation (you cannot select it, nor update it). This is an uniqueifier column (is_uniqueifier is 1) because we did not specify that our clustered index in unique:

    If the clustered index is not created with the UNIQUE property, the Database Engine automatically adds a 4-byte uniqueifier column to the table. When it is required, the Database Engine automatically adds a uniqueifier value to a row to make each key unique. This column and its values are used internally and cannot be seen or accessed by users.



    Klaus Aschenbrenner has a good series of articles that explain uniqueifier columns in more detail.

  • The non-clustered index that enforces the primary key constraint has 3 physical columns: user_id and two unnamed ones. This is because each row in the non-clustered index contains the corresponding clustered index row key values, in our case the birth_date column and the uniqueifier column.

Online operations and anti-matter columns

Is the uniqueifier column the only hidden column in a table? No. Lets rebuild our non-clustered index, making sure we specify it to be an online operation:

alter index pk_users_user_id on users rebuild with (online=on);


Notice how the non-clustered index now has one more column, a new column that has the is_antimatter value 1. As you probably guess, this is an anti-matter column. For an in-depth explanation of what is the purpose of the anti-matter column I recommend reading Online Indexing Operations in SQL Server 2005:

During the build phase, rows in the new “in-build” index may be in an intermediate state called antimatter. This mechanism allows concurrent DELETE statements to leave a trace for the index builder transaction to avoid inserting deleted rows. At the end of the index build operation all antimatter rows should be cleared.

Note that even after the online operation finishes and the antimatter rows are removed, the antimatter column will be part of the physical rowset structure.

There could exist more hidden columns in the table, for example if we enable change tracking:

alter table users ENABLE CHANGE_TRACKING;


Enabling change tracking on our table has added one more hidden column.

Table structure changes

Next lets look at some table structure modifying operations: adding, dropping and modifying columns. Were going to start anew with a fresh table for our examples:

create table users  (
	user_id int not null identity(1,1) primary key,
	first_name char(100) null,
	last_name char(100) null,
	birth_date datetime null);
go


Next lets modify some columns: we want to make the birth_date not nullable and reduce the length of the first_name to 75:

alter table users alter column birth_date datetime not null;
alter table users alter column first_name char(75);
go


Notice how the two alter operations ended up in adding a new column each, and dropping the old column. But also note how the null bit and the leaf_offset values have stayed the same. This means that the column was added ‘in place’, replacing the dropped column. This is a metadata only operation that did not modify any record in the table, it simply changed how the data in the existing records is interpreted.

But now we figure out the 75 characters length is wrong and we want to change it back to 100. Also, the birth_date column probably doesn’t need hours, so we can change it to a date type:

alter table users alter column birth_date date not null;
alter table users alter column first_name char(100);
go


The birth_date column has changed type and now it requires only 3 bytes, but the change occurred in-place, just as the nullability change we did before: it remains at the same offset in the record and it has the same null bit. However, the first_name column was moved from offset 8 to offset 211, and the null bit was changed from 4 to 5. Because the first_name column has increased in size it cannot occupy the same space as before in the record and the record has effectively increased to accommodate the new first_name column. This happened despite the fact that the first_name column was originally of size 100 so in theory it could reclaim the old space it used in the record, but the is simply a too corner case for the storage engine to consider.

By now we figure that the fixed length 100 for the first_name and last_name columns was a poor choice, so we would like to change them to more appropriate variable length columns:

alter table users alter column first_name varchar(100);
alter table users alter column last_name varchar(100);
go


The type change from fixed length column to variable length column cannot be done in-place, so the first_name and last_name columns get new null bit values.They also have negative leaf_offset values, which is typical for variable length columns as they don’t occupy fixed positions in the record.

Next lets change the length of the variable columns:

alter table users alter column first_name varchar(250);
alter table users alter column last_name varchar(250);
go

For this change the column length was modified without dropping the column and adding a new one. An increase of size for a variable length columns is one of the operations that is really altering the physical column, not dropping the column and adding a new one to replace it. However, a decrease in size, or a nullability change, does again drop and add the column, as we can quickly check now:

alter table users alter column first_name varchar(100);
alter table users alter column last_name varchar(250) not null;
go


Finally, lets say we tried to add two more fixed length columns, but we we undecided on the name and length so we added a couple of columns, then deleted them and added again a new one:

alter table users add mid_name char(50);
alter table users add surname char(25);
alter table users drop column mid_name;
alter table users drop column surname;

alter table users add middle_name char(100);
go


This case is interesting because it shows how adding a new fixed column can reuse the space left in the row by dropped fixed columns, but only if the dropped columns are the last fixed columns. In our table the columns mid_name and surname where originally added at offset 211 and 261 respectively and their length added up to 75 bytes. After we dropped them, the middle_name column we added is placed at offset 211, thus reusing the space formerly occupied by the dropped columns. This happens even though the length of the newly added column is 100 bytes, bigger than the 75 bytes occupied by the dropped columns before.

By now, our 5 column table has actually 15 columns in the physical storage format, 10 dropped columns and 5 usable columns. The table uses 412 bytes of fixed space for the 3 fixed length columns that have a total length of only 112 bytes. The variable length columns that now store the first_name and last_name are stored in the record after these 412 reserved bytes for fixed columns that are now dropped. Since the records always consume all the reserved fixed size, this is quite wasteful. How do we reclaim it? Rebuild the table:

alter table users rebuild;
go


As you can see the rebuild operation got rid off the dropped columns and now the physical storage layout is compact, aligned with the logical layout. So whenever doing changes to a table structure remember that at the storage layer most changes are cumulative, they are most times implemented by dropping a column and adding a new column back, with the new type/length/nullability. Whenever possible the a modified column reuses the space in the record and newly added columns may reuse space previously used by dropped columns.

Partitioned Tables

When partitioning is taken into account another dimension of the physical table structure is revealed. To start, lets consider a typical partitioned table:

create partition function pf (date) as range for values ('20111001');
go

create partition scheme ps as partition pf all to ([PRIMARY]);
go

create table sales (
	sale_date date not null,
	item_id int not null,
	store_id int not null,
	price money not null)
	on ps(sale_date);
go


As we know, the partitioned tables are, from a physical point of view, a collection of rowsets that belong to the same logical object (the 'table'). Looking under the hood shows that our table has two rowsets, and they have identical column structure. Typically new partitions are added by the ETL process that uploads data into a staging table that gets switched in using a fast partition switch operation:

create table sales_staging (
	sale_date date not null,
	item_id int not null,
	store_id int not null,
	price numeric(10,2) not null,
	constraint check_partition_range
		check (sale_date = '20111002'))
	on [PRIMARY];

alter partition scheme ps next used [PRIMARY];
alter partition function pf() split range ('20111002');
go

alter table sales_staging switch to sales partition $PARTITION.PF('20111002');
go
Msg 4944, Level 16, State 1, Line 2

ALTER TABLE SWITCH statement failed because column 'price' has data type numeric(10,2) in source table 'test.dbo.sales_staging' which is different from its type money in target table 'test.dbo.sales'.

OK, so we made a mistake in the staging table, so lets correct it, then try to switch it in again:

alter table sales_staging alter column price money not null;
alter table sales_staging switch to sales partition $PARTITION.PF('20111002');
go


We see that the partition switch operation has switched in the rowset of the staging table into the second partition of the sales table. But since the staging table had an operation that modified a column type, which in effect has dropped the numeric price columns and added a new price column of the appropriate money type, the second rowset of the partitioned table now has 5 columns, including a dropped one. The partition switch operation brings into the partitioned table the staging table as is, dropped columns and all. What that means is that partitions of the partitioned table can have, under the hood, a completely different physical structure from one another. Normally this should be of little concern and just a courisity to woe the newbies at DBA cocktail parties, but this problem has a darker side, known as KB2504090:

This issue occurs because the accessor that SQL Server uses to insert data into different partitions recognizes the metadata changes incorrectly. When data is inserted into the new partition that is created after a column is dropped, the number of the maximum nullable columns in the new partition may be one less than the number of the maximum nullable columns in the old partition.

If you ever find your partitioned tables not to have an uniform internal structure across all partitions, I recommend you rebuild the partition that is different. This simple query can be used to look at the number of physical columns in each partition of a table:

select count(*) as count_columns,
	index_id,
	partition_number
from sys.system_internals_partitions p
join sys.system_internals_partition_columns pc
	on p.partition_id = pc.partition_id
where p.object_id = object_id('sales')
group by index_id, partition_number;
go


This query shows that partitions 1 and 3 have 4 physical columns, while partition 2 has 5. We can rebuild partition 2:

alter table sales rebuild partition = 2;
go

With this operation we have rebuild the partition number 2 from scratch and removed all dropped columns in the process.

Note that my query above would only detect differences in the number of physical columns, but two partitions can still have a very different physical layout even if they have the same number of physical columns, eg. one can have physical column number 9 dropped and physical column number 10 used, while the other can have the opposite. Use your judgement when looking at the internal physical column layout to understand if they are truly the same.

T-SQL functions do no imply a certain order of execution

August 10th, 2011

Looking at this question on StackOverflow: Conversion failed when converting from a character string to uniqueidentifier error in SQL Server one can see a reproducible example where a string split UDF works fine in the SELECT statement, but it gives a conversion error in the DELETE statement. Certainly, a bug, right? Actually, not.

The issue at hand is in fact remarkably similar to another common misconception around T-SQL I had to debunk some time ago, see On SQL Server boolean operator short-circuit: that C like operator short-circuit is guaranteed in T-SQL (hint: it isn’t, read the linked article to see a clear counter example).

In the StackOverlow post the misconception is that order of declaration implies order of execution, that the function is evaluated somehow separately from the rest of the query and some sort of temporary result is created that is then used in the overall query execution. This understanding comes naturally to the imperative procedural language mindset of developers trained in C, C++, C# and other similar languages. But in the SQL Server declarative language that is T-SQL, your intuition is actually wrong. To illustrate I will give a simple counter-example, reusing the code from my earlier boolean short-circuit article:


create table eav (
    eav_id int identity(1,1) primary key,
    attribute varchar(50) not null,
    is_numeric bit not null,
    [value] sql_variant null);
create index eav_attribute on eav(attribute) include ([value]);
go

-- Fill the table with random values
set nocount on
declare @i int;
select @i = 0;
begin tran
while @i < 100000
begin
    declare @attribute varchar(50),
        @is_numeric bit,
        @value sql_variant;
    select @attribute = 'A' + cast(cast(rand()*1000 as  int) as varchar(3));
    select @is_numeric = case when rand() > 0.5 then 1 else 0 end;
    if 1=@is_numeric
        select @value = cast(rand() * 100 as int);
    else
        select @value = 'Lorem ipsum';
    insert into eav (attribute, is_numeric, [value])
    values (@attribute, @is_numeric, @value);
    select @i = @i+1;
    if (@i % 1000 = 0)
    begin
		commit;
		raiserror(N'Inserted %d', 0,0,@i);
		begin tran;
    end
end
commit
go

-- insert a 'trap'
insert into eav (attribute, is_numeric, [value])
values ('B1', 0, 'Gotch ya');
go

Now we’re going to define our inline table UDF:


create function udf_get_only_numerics()
returns table
with schemabinding
as return
	select [value], [attribute]
	from dbo.eav
	where is_numeric = 1;
go

And now lets innocently query our UDF:


select [value]
	from dbo.udf_get_only_numerics ()
    where attribute = 'B1'
    and cast([value] as int) > 50
go

Since we’re selecting from the UDF, we’re guaranteed that we’re only going to see values that have the is_numeric column value 1, so the cast([value] as int) must always succeed, right? Wrong, we’ll get a conversion exception:


Msg 245, Level 16, State 1, Line 3
Conversion failed when converting the varchar value 'Gotch ya' to data type int.

So what’s going on? Lets insert a known value and inspect the execution plan:


insert into eav (attribute, is_numeric, [value])
values ('B2', 1, 65);
go

select [value]
	from dbo.udf_get_only_numerics ()
    where attribute = 'B2'
    and cast([value] as int) > 50
go

Looking at the plan we see that, just as in my boolean short-circuit example, the presence of an index on attribute that includes the value column but not the is_numeric was too good opportunity to pass for the optimizer. And this plan will evaluate the cast([value]as int) expression before it evaluate the WHERE clause inside the UDF. That’s right, the WHERE clause of the UDF has moved in the execution plan above the WHERE clause of the query using the UDF. The naive assumption that the function definition posses some sort of barrier for execution was proven wrong.

So how does this explains the failure seen in the StackOverflow question? The UDF, which was originally posted on Tally OH! An Improved SQL 8K “CSV Splitter” Function article on sqlservercentral.com, contains WHERE clauses to that in effect restricts the output to only complete tokens. w/o the WHERE clause it would return non-complete token. We can easily test this, lets remove the WHERE clause and use the same input as the test in the StackOverflow question:


CREATE FUNCTION dbo.DelimitedSplit8K_noWhere
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
     -- enough to cover VARCHAR(8000)
 WITH E1(N) AS (
   SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
   SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
   SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (
                 SELECT 0 UNION ALL
                 SELECT TOP (DATALENGTH(ISNULL(@pString,1)))
                   ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (
                 SELECT t.N+1
                   FROM cteTally t
                )
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
        Item  = SUBSTRING(@pString,s.N1,
                ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))
   FROM cteStart s;

When we run this the output contains not only the two token in the input, but also every substring of these tokens:


-- Test string that will be split into table in the DelimitedSplit8k function
declare @temp varchar(max) =
     '918E809E-EA7A-44B5-B230-776C42594D91,6F8DBB54-5159-4C22-9B0A-7842464360A5'
select Item from dbo.DelimitedSplit8K_nowhere(@temp, ',');
go
Item
-----------------------------------------
918E809E-EA7A-44B5-B230-776C42594D91
18E809E-EA7A-44B5-B230-776C42594D91
8E809E-EA7A-44B5-B230-776C42594D91
E809E-EA7A-44B5-B230-776C42594D91
809E-EA7A-44B5-B230-776C42594D91
09E-EA7A-44B5-B230-776C42594D91
9E-EA7A-44B5-B230-776C42594D91
E-EA7A-44B5-B230-776C42594D91
-EA7A-44B5-B230-776C42594D91
...

By now the explanation to the issue seen in the StackExchange question should be obvious: just as in my example, the WHERE clause was pulled out from the function and placed in the generated plan above the cast(Item as uniqueidentifier) expression. During execution SQL Server is asked to convert the string ’18E809E-EA7A-44B5-B230-776C42594D91′ to a uniqueidentifier and, understandably, it fails.

Do no make assumptions about order of executions when using inline table UDFs. Q.E.D.

FILESTREAM MVC: Download and Upload images from SQL Server

February 6th, 2011

In a previous article I have shown how it is possible to use efficient streaming semantics when Download and Upload images from SQL Server via ASP.Net MVC. In this article I will go over an alternative approach that relies on the FILESTREAM column types introduced in SQL Server 2008.

What is FILESTREAM?

FILESTREAM storage is a new option available in SQL Server 2008 and later that allows for BLOB columns to be stored directly on the file system as individual files. As files, the data is accessible through the Win32 file access API like ReadFile and WriteFile. But at the same time the same data is available through the normal T-SQL operations like SELECT or UPDATE. Not only that, but the data is contained logically in the database so it will be contained in a database backup, it is subject to ordinary transaction commit and rollback behavior, it is searched by SQL Server FullText indexes and it follows the normal SQL Server security access rules: if you are granted SELECT permission on the table, then you can open the file to read. There are some restrictions, eg. a database with FILESTREAM cannot be mirrored. For a full list of restrictions and limitations, see Using FILESTREAM with Other SQL Server Features. Note that SQL Server Express edition does support FILESTREAM storage.

Another attribute of the FILESTREAM storage is the size limitation: normal BLOB column values have a maximum size of 2Gb. FILESTREAM columns are limited only by the volume size limit of the file system. 2Gb may seem like a large value, but consider that a media file like an HD movie stream can easily go up to a size of 5Gb.

Using FILESTREAM

One way to use FILESTREAM columns is to treat them as ordinary BLOB values and manipulate them through T-SQL. The one restriction in place is that the efficient partial update syntax for BLOBs is not supported. That is, one cannot issue UPDATE table SET column.WRITE(...) WHERE ... on a FILESTREAM column. But where FILESTREAM storage begins to shine is when accessed through the system file API. This allows the application to efficiently read, write and seek in a large BLOB value, just as it would in a file. In fact, the application does read, write and seek in a file :) .

Native Win32 applications use a new API function OpenSqlFilestream that opens a HANDLE that can be then used with the file IO API functions. Managed applications use the new SqlFileStream class that exposes a Stream based on the underlying FILESTREAM value. Both the native and the manged API require as input two special values, a PathName and a Transaction Context that must be obtained previously from the SQL Server using T-SQL.

The requirement to provide a Transaction Context when manipulating a FILESTREAM value through the file IO API highlights another aspect of working with this new type: a T-SQL transaction must be started and must be kept open while the file is manipulated, and then it must be committed. If you think about it such a requirement is to be expected, since we said that FILESTREAM columns are subject to normal T-SQL transaction commit and rollback semantics, including when they are manipulated through the Windows file read/write API.

FILESTREAM based images table

In order to have FILESTREAM column, we must have a special filegroup in our database, a FILESTREAM filegroup. You can either add a new filegroup to your existing database, or you can create the database with a FILESTREAM filegroup from scratch. Also the FILESTREAM feature must be enabled on the SQL Server instance. For all the details, see Getting Started with FILESTREAM Storage. For my project, I’m simply going to create a new database with a FILESTREAM filegroup:


create database images
	on (name='images_data', filename='c:\temp\images.mdf')
	, filegroup FS contains FILESTREAM
              (name = 'images_files', filename='c:\temp\images_files')
	log on (name='images_log', filename='c:\temp\images.ldf');
go

The media table used by our MVC project is going to be similar to the one used in the previous article but the content column will have the FILESTREAM attribute added. A table that has FILESTREAM columns is required to have a ROWGUIDCOL column, so we’re going to add one of those as well:


create table media (
	[media_id] int not null identity(1,1),
	[file_name] varchar(256),
	[content_type] varchar(256),
	[content_coding] varchar(256),
	[media_rowguid] uniqueidentifier not null
               ROWGUIDCOL UNIQUE default newsequentialid() ,
	[content] varbinary(max) filestream,
        constraint pk_media_id primary key([media_id]),
	constraint unique_file_name unique ([file_name]));
go

FILESTREAM based IMediaRepository

If you haven’t read the previous article Download and Upload images from SQL Server via ASP.Net MVC yet, now is a good time to do it. I am going to reuse the same code and simply provide a new implementation for the IMediaRepository interface, an implementation that works with FILESTREAM storage:


    /// <summary>
    /// SQL Server FILESTREAM based implementation of the IMediaRepository
    /// </summary>
    public class FileStreamMediaRepository: IMediaRepository
    {
        /// <summary>
        /// Gets an open connection to the SQL Server back end
        /// </summary>
        /// <returns>the SqlConneciton object, ready to use</returns>
        private SqlConnection GetConnection()
        {
            SqlConnectionStringBuilder scsb = new SqlConnectionStringBuilder(
                ConfigurationManager.ConnectionStrings["Images"].ConnectionString);
            scsb.Pooling = true;
            scsb.AsynchronousProcessing = true;
            SqlConnection conn = new SqlConnection(scsb.ConnectionString);
            conn.Open();
            return conn;
        }

        /// <summary>
        /// Gets a file from the SQL repository
        /// </summary>
        /// <param name="fileName">filename to retrieve</param>
        /// <param name="file">Output, the model for the file if found</param>
        /// <returns>True if the file is found, False if not</returns>
        public bool GetFileByName(string fileName, out FileDownloadModel file)
        {
            SqlConnection conn = GetConnection();
            SqlTransaction trn = conn.BeginTransaction();
            try
            {
                SqlCommand cmd = new SqlCommand(
                    @"SELECT file_name,
	                    content_type,
                        content_coding,
                        DATALENGTH (content) as content_length,
	                content.PathName() as path,
                        GET_FILESTREAM_TRANSACTION_CONTEXT ()
                    FROM media
                    WHERE file_name = @fileName;", conn, trn);
                SqlParameter paramFilename = new SqlParameter(
                            @"fileName", SqlDbType.VarChar, 256);
                paramFilename.Value = fileName;
                cmd.Parameters.Add(paramFilename);
                using (SqlDataReader reader = cmd.ExecuteReader())
                {
                    if (false == reader.Read())
                    {
                        reader.Close();
                        trn.Dispose();
                        conn.Dispose();
                        trn = null;
                        conn = null;
                        file = null;
                        return false;
                    }

                    string contentDisposition = reader.GetString(0);
                    string contentType = reader.GetString(1);
                    string contentCoding = reader.IsDBNull(2) ? null : reader.GetString(2);
                    long contentLength = reader.GetInt64(3);
                    string path = reader.GetString(4);
                    byte[] context = reader.GetSqlBytes(5).Buffer;

                    file = new FileDownloadModel
                    {
                        FileName = contentDisposition,
                        ContentCoding = contentCoding,
                        ContentType = contentType,
                        ContentLength = contentLength,
                        Content = new MvcResultSqlFileStream
                        {
                            SqlStream = new SqlFileStream(path, context, FileAccess.Read),
                            Connection = conn,
                            Transaction = trn
                        }
                    };
                    conn = null; // ownership transfered to the stream
                    trn = null;
                    return true;
                }
            }
            finally
            {
                if (null != trn)
                {
                    trn.Dispose();
                }
                if (null != conn)
                {
                    conn.Dispose();
                }
            }
        }

        /// <summary>
        /// Adds a file to the SQL repository
        /// </summary>
        /// <param name="file">POST-ed file to be added</param>
        /// <param name="fileName">The filename part of the uploaded file path</param>
        public void PostFile(HttpPostedFileBase file, out string fileName)
        {
            fileName = Path.GetFileName(file.FileName);

            using (SqlConnection conn = GetConnection ())
            {
                using (SqlTransaction trn = conn.BeginTransaction ())
                {
                    SqlCommand cmdInsert = new SqlCommand(
@"insert into media
    (file_name, content_type, content_coding, content)
output
	INSERTED.content.PathName(),
    GET_FILESTREAM_TRANSACTION_CONTEXT ()
values
    (@content_disposition, @content_type, @content_coding, 0x)", conn, trn);
                    cmdInsert.Parameters.Add("@content_disposition", SqlDbType.VarChar, 256);
                    cmdInsert.Parameters["@content_disposition"].Value = fileName;
                    cmdInsert.Parameters.Add("@content_type", SqlDbType.VarChar, 256);
                    cmdInsert.Parameters["@content_type"].Value = file.ContentType;
                    cmdInsert.Parameters.Add("@content_coding", SqlDbType.VarChar, 256);
                    cmdInsert.Parameters["@content_coding"].Value = DBNull.Value;

                    string path = null;
                    byte[] context = null;

                    // cmdInsert is an INSERT command that uses the OUTPUT clause
                    // Thus we use the ExecuteReader to get the
                    // result set from the output columns
                    //
                    using (SqlDataReader rdr = cmdInsert.ExecuteReader())
                    {
                        rdr.Read();
                        path = rdr.GetString(0);
                        context = rdr.GetSqlBytes(1).Buffer;
                    }

                    using (SqlFileStream sfs = new SqlFileStream(
                                     path, context, FileAccess.Write))
                    {
                        file.InputStream.CopyTo(sfs);
                    }
                    trn.Commit ();
                }
            }
        }
    }

The implementation is pretty straight forward. The PostFile methods starts a transaction, inserts a row in the images table and obtains the PahtName and FILESTREAM transaction context to the newly inserted row, and then opens a SqlFileStream and copies in the uploaded file directly into the file that backs the FILESTREAM column. The GetFileByName method obtains the PathName and FILESTREAM transaction context of the desired file and then returns a SqlFileStream that accesses directly the file in which the FILESTREAM value is stored.

The only thing requiring explanation is the MvcResultSqlFileStream class. Because the SqlFileStream uses a transaction context, it is required to keep the SqlConnection and the SqlTransaction open until the SqlFileStream finishes accessing the content. In the PostFile method this requirement is achieved easily because the entire usage of the SqlFileStream complete within the stack frame, but the GetFileByName method has to return a Stream that in turn is passed into a FileStreamResult by the MediaController class and there is no control over how the MVC framework will use this returned ActionResult. My solution is to wrap the SqlFileStream in a new Stream derived class (since SqlFileStream is sealed, it cannot be inherited...) and then handle the resource management in this derived class' Dispose:


    /// <summary>
    /// Implementation of System.IO.Stream based on a SqlFileStream
    /// Disposes the connection, transaction and SqlFileStream objects
    /// </summary>
    public class MvcResultSqlFileStream: Stream
    {
        public SqlFileStream SqlStream { get; set; }
        public SqlConnection Connection { get; set; }
        public SqlTransaction Transaction { get; set; }
        public override bool CanRead
        {
            get { return SqlStream.CanRead; }
        }

        public override bool CanSeek
        {
            get { return SqlStream.CanSeek; }
        }

        public override bool CanWrite
        {
            get { return SqlStream.CanWrite; }
        }

        public override void Flush()
        {
            SqlStream.Flush ();
        }

        public override long Length
        {
            get { return SqlStream.Length; }
        }

        public override long Position
        {
            get
            {
                return SqlStream.Position;
            }
            set
            {
                SqlStream.Position = value;
            }
        }

        public override int Read(byte[] buffer, int offset, int count)
        {
            return SqlStream.Read (buffer, offset, count);
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            return SqlStream.Seek(offset, origin);
        }

        public override void SetLength(long value)
        {
            SqlStream.SetLength(value);
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            SqlStream.Write(buffer, offset, count);
        }

        protected override void Dispose(bool disposing)
        {
            if (disposing)
            {
                SqlStream.Dispose();
                Transaction.Dispose();
                Connection.Dispose();
            }
            base.Dispose(disposing);
        }
    }

This extra class feels rather unnecessary, but unfortunately is required because the MVC framework has no proper handling of resources passed out from the controller. If you ask me ActionResult should implement IDisposable, but then of course nobody asks me ;)

You can download the source for the entire MVC project from here.

Conclusion

Handling media content stored in the database using FILESTREAM storage makes for a nicer developer experience. The FILESTREAM content can be accessed using streaming semantics based on the SqlFileStream class, or using the Win32 file IO API for native applications. By comparison the previous article that relied on straight T-SQL to manipulate the BLOB values had to do some very unintuitive tricks in order to achieve streaming semantics, specially for uploading a file. FILESTREAM storage simplifies development and is supported in every SQL Server edition, including Express.

If you intend to deploy FILESTREAM in your environment I recommend going over some Knowledge Base and white paper articles first:

How to pass a NULL value in a message to a queue in SQL Server

January 15th, 2011

The SEND Transact-SQL verb does not allow to send a NULL message body, attempting to do so will result in error:


Msg 8433, Level 16, State 1, Line 11
The message body may not be NULL.  A zero-length UNICODE or binary string is allowed.

But there are ways to send a NULL message body. One way is to completely omit the message body argument:


SEND ON CONVERSATION @handle MESSAGE TYPE [...];

Another way is to send a 0 length message body, which will be enqueued as a NULL message body in the target queue:


SEND ON CONVERSATION @handle MESSAGE TYPE [...] (0x);
SEND ON CONVERSATION @handle MESSAGE TYPE [...] ('');
SEND ON CONVERSATION @handle MESSAGE TYPE [...] (N'');

All three forms above will enqueue the same message body: NULL. This is true for both binary messages (VALIDATION = NONE) and for XML messages (VALIDATION=WELL_FORMED_XML).

Here is a short test script showing this:


create message type [BINARY] validation = none;
create message type [XML] validation = well_formed_xml;
go

create contract [TEST] (
	[BINARY] sent by initiator,
	[XML] sent by initiator);
go

create queue Sender;
create service Sender on queue Sender;
go

create queue Receiver;
create service Receiver on queue Receiver  ([TEST]);
go

declare @h uniqueidentifier;

begin dialog conversation @h
from service [Sender]
to service N'Receiver', N'current database'
on contract [TEST]
with encryption = off;

send on conversation @h message type [BINARY];
send on conversation @h message type [BINARY] (0x);

send on conversation @h message type [XML];
send on conversation @h message type [XML] ('');
send on conversation @h message type [XML] (N'');

receive * from [Receiver];
go

The received message_body column have a NULL value for all 5 messages sent.

Download and Upload images from SQL Server via ASP.Net MVC

December 28th, 2010

A frequent question that pops up on discussion forums is how to serve an image that is stored in a SQL Server table from an ASP.Net application. Unfortunately the answer is almost always wrong, as the prevalent solution involves copying the entire image file into memory before returning it to the client. This solution works fine when tested with a light load and returning few small images. But in production environment the memory consumption required by all those image files stored as byte arrays in memory causes serious performance degradation. A good solution must use streaming semantics, transferring the data in small chunks from the SQL Server to the HTTP returned result.

The SqlClient components do offer streaming semantics for large result sets, including large BLOB fields, but the client has to specifically ask for it. The ‘secret ingredient’ is the passing in the CommandBehavior.SequentialAccess flag to the SqlCommand.ExecuteReader:

Provides a way for the DataReader to handle rows that contain columns with large binary values. Rather than loading the entire row, SequentialAccess enables the DataReader to load data as a stream. You can then use the GetBytes or GetChars method to specify a byte location to start the read operation, and a limited buffer size for the data being returned.

An ASP.Net MVC virtual Media folder backed by SQL Server

Lets say we want to have a virtual Media folder in an ASP.Net MVC site, serving the files from a SQL Server database. A GET request for an URL like "http://site/Media/IMG0042.JPG" should return the content of the file named IMG0042.JPG from the database. A POST request to the URL "http://site/Media" which contains an embedded file should insert this new file in the database and redirect the response to the newly added file virtual path. This is how our upload HTML form looks like:

<form method="post" action="/Media" enctype="multipart/form-data">
<input type="file" name="file" id="file"/>
<input type="submit" name="Submit" value="Submit"/>
</form>

We’ll start by adding a special route in Global.asax.cs that will act like a virtual folder for file download requests:

            routes.MapRoute(
                "Media",
                "Media/{filename}",
                new { controller = "Media", action = "GetFile" },
                new { filename = @"[^/?*:;{}\\]+" });

Note that the upload will be handled by the default MVC route if we add an Index() method to our controller that will handle the POST.

For our GET requests we need a FileDownloadModel class to represent the requested file properties. for this sample, we don’t need a POST model since we’re only going to have have a single input field, the uploaded file. We’re going to use a Repository interface that declares two methods: GetFileByName returns from the repository a FileDownloadModel given a file name, and PutFile will accept an uploaded file and put it in the repository.

    public class FileDownloadModel
    {
        public string FileName {get; internal set;}
        public string ContentType { get; internal set; }
        public string ContentCoding { get; internal set; }
        public long ContentLength { get; internal set; }
        public Stream Content { get; internal set; }
    }

    public interface IMediaRepository
    {
        bool GetFileByName(
            string fileName,
            out FileDownloadModel file);

        void PostFile(
            HttpPostedFileBase file,
            out string fileName);
    }

With this Repository interface we can actually code our MediaController class:

    public class MediaController : Controller
    {
        public IMediaRepository Repository { get; set; }

        public MediaController()
        {
            Repository = new SqlMediaRepository();
        }

        [HttpPost]
        public ActionResult Index()
        {
            string fileName;
            Repository.PostFile(Request.Files[0], out fileName);
            return new RedirectResult("/Media/" + fileName);
        }

        [HttpGet]
        public ActionResult GetFile(string fileName)
        {
            FileDownloadModel model;
            if (false == Repository.GetFileByName(
                fileName,
                out model))
            {
                return new HttpNotFoundResult
                {
                    StatusDescription = String.Format(
                        "File {0} not found",
                        fileName)
                };
            }

            if (null != model.ContentCoding)
            {
                Response.AddHeader(
                    "Content-Encoding",
                    model.ContentCoding);
            }
            Response.AddHeader(
                "Content-Length",
                model.ContentLength.ToString ());

            Response.BufferOutput = false;

            return new FileStreamResult(
                model.Content,
                model.ContentType);
        }
    }

In have hard coded the Repository implementation to SqlMediaRepository, a class we’ll create shortly. A real project would probably use the Dependency Injection or Inversion of Control patterns, perhaps using Castle Windsor for example. For brevity, I will skip these details, there are plenty of blogs and articles describing how to do it.

Note the use of the FileStreamResult return, which is an MVC supplied action for returning a download of an arbitrary Stream object. Which also brings us to the next point, we need to implement a Stream that read content from a SqlDataReader.

A SqlDataReader based Stream

We now need an implementation of the abstract Stream class that can stream out a BLOB column from a SqlDataReader. Wait, you say, doesn’t SqlBytes already have a Stream property that reads a BLOB from a result as a Stream? Unfortunately this small remark makes this class useless for our purposes:

Getting or setting the Stream property loads all the data into memory. Using it with large value data can cause an OutOfMemoryException.

So we’re left with implementing a Stream based on a SqlDataReader BLOB field, a Stream that returns the content of the BLOB using the proper GetBytes calls and does not load the entire BLOB in memory. Fortunately this is fairly simple as we only need to implement a handful of methods:

    public class SqlReaderStream: Stream
    {
        private SqlDataReader reader;
        private int columnIndex;
        private long position;

        public SqlReaderStream(
            SqlDataReader reader,
            int columnIndex)
        {
            this.reader = reader;
            this.columnIndex = columnIndex;
        }

        public override long Position
        {
            get { return position; }
            set { throw new NotImplementedException(); }
        }

        public override int Read(byte[] buffer, int offset, int count)
        {
            long bytesRead = reader.GetBytes(columnIndex, position, buffer, offset, count);
            position += bytesRead;
            return (int)bytesRead;
        }

        public override bool CanRead
        {
            get { return true; }
        }

        public override bool CanSeek
        {
            get { return false; }
        }

        public override bool CanWrite
        {
            get { return false; }
        }

        public override void Flush()
        {
            throw new NotImplementedException();
        }

        public override long Length
        {
            get { throw new NotImplementedException(); }
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotImplementedException();
        }

        public override void SetLength(long value)
        {
            throw new NotImplementedException();
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            throw new NotImplementedException();
        }

        protected override void Dispose(bool disposing)
        {
            if (disposing && null != reader)
            {
                reader.Dispose();
                reader = null;
            }
            base.Dispose(disposing);
        }
    }

As you can see, we need only to return the proper responses to CanRead (yes), CanWrite (no) and CanSeek (also no), keep track of our current Position and we need implement Read to fetch more bytes from the reader, using GetReader.

We’re also overriding the Dispose(bool disposing) method. This is because we’re going to have to close the SqlDataReader when the content stream transfer is complete. If this is the first time you see this signature of the Dispose method, then you must read Implementing a Dispose Method

Streaming Upload of BLOB data

Just like retrieving large BLOB data from SQL Server poses challenges to avoid creating full blown in-memory copies of the entire BLOB, similar issues arrive when attempting to insert a BLOB. The best solution is actually quite convoluted. It involves sending the data to the server in chunks, and using the in-place BLOB UPDATE syntax. The MSDN has this to say in the Remarks section:

Use the .WRITE (expression, @Offset, @Length) clause to perform a partial or full update of varchar(max), nvarchar(max), and varbinary(max) data types. For example, a partial update of a varchar(max) column might delete or modify only the first 200 characters of the column, whereas a full update would delete or modify all the data in the column. For best performance, we recommend that data be inserted or updated in chunk sizes that are multiples of 8040 bytes.

To implement such semantics, we will write a second Stream implementation, this time for uploads:

    public class SqlStreamUpload: Stream
    {
        public SqlCommand InsertCommand { get; set; }
        public SqlCommand UpdateCommand { get; set; }
        public SqlParameter InsertDataParam { get; set; }
        public SqlParameter UpdateDataParam { get; set; }

        public override bool CanRead
        {
            get { return false; }
        }

        public override bool CanSeek
        {
            get { return false; }
        }

        public override bool CanWrite
        {
            get { return true; }
        }

        public override void Flush()
        {
        }

        public override long Length
        {
            get { throw new NotImplementedException(); }
        }

        public override long Position
        {
            get; set;
        }

        public override int Read(byte[] buffer, int offset, int count)
        {
            throw new NotImplementedException();
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotImplementedException();
        }

        public override void SetLength(long value)
        {
            throw new NotImplementedException();
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            byte[] data = buffer;
            if (offset != 0 ||
                count != buffer.Length)
            {
                data = new byte[count];
                Array.Copy(buffer, offset, data, 0, count);
            }
            if (0 == Position &&
                null != InsertCommand)
            {
                InsertDataParam.Value = data;
                InsertCommand.ExecuteNonQuery();
            }
            else
            {
                UpdateDataParam.Value = data;
                UpdateCommand.ExecuteNonQuery();
            }
            Position += count;
        }
    }

This Stream implementation uses two SqlCommand objects: an InsertCommand to save the very first chunk, and an UpdateCommand to save the subsequent chunks. Note that the chunk sizes (the optimal 8040 bytes) is nowhere specified, that is easily achieved by wrapping the SqlStreamUpload in a BufferedStream instance.

The MEDIA table

create table media (
	[media_id] int not null identity(1,1),
	[file_name] varchar(256),
	[content_type] varchar(256),
	[content_coding] varchar(256),
	[content] varbinary(max),
        constraint pk_media_id primary key([media_id]),
	constraint unique_file_name unique ([file_name]));

This table contains the downloadable media files. Files are identified by name, so the names have an unique constraint. I’ve added a IDENTITY primary key, because in a CMS these files are often references from other parts of the application and and INT is a shorter reference key than a file name. The content_type field is needed to know the type of file: “image/png”, “image/jpg” etc (see the officially registered IANA Media types. The content_coding field is needed if the files are stored compressed and a Content-Encoding HTTP header with the tag “gzip” or “deflate” has to be added to the response. Note that most image types (JPG, PNG) don’t compress well, as these file formats already include a compression algorithm and when compressed again with the common HTTP transfer algorithms (gzip, deflate, compress) they usually increase in size.

The SqlMediaRepository

The final piece of the puzzle: an implementation of the IMediaRepository interface that uses a SQL Server back end for the files storage:

    public class SqlMediaRepository: IMediaRepository
    {
        private SqlConnection GetConnection()
        {
            SqlConnectionStringBuilder scsb = new SqlConnectionStringBuilder(
                ConfigurationManager.ConnectionStrings["Images"].ConnectionString);
            scsb.Pooling = true;
            SqlConnection conn = new SqlConnection(scsb.ConnectionString);
            conn.Open();
            return conn;
        }

        public void PostFile(
            HttpPostedFileBase file,
            out string fileName)
        {
            fileName = Path.GetFileName(file.FileName);

            using (SqlConnection conn = GetConnection())
            {
                using (SqlTransaction trn = conn.BeginTransaction())
                {
                    SqlCommand cmdInsert = new SqlCommand(
                        @"INSERT INTO media (
                            file_name,
                            content_type,
                            content_coding,
                            content)
                        values (
                            @content_disposition,
                            @content_type,
                            @content_coding,
                            @data);", conn, trn);
                    cmdInsert.Parameters.Add("@data", SqlDbType.VarBinary, -1);
                    cmdInsert.Parameters.Add("@content_disposition", SqlDbType.VarChar, 256);
                    cmdInsert.Parameters["@content_disposition"].Value = fileName;
                    cmdInsert.Parameters.Add("@content_type", SqlDbType.VarChar, 256);
                    cmdInsert.Parameters["@content_type"].Value = file.ContentType;
                    cmdInsert.Parameters.Add("@content_coding", SqlDbType.VarChar, 256);
                    cmdInsert.Parameters["@content_coding"].Value = DBNull.Value;

                    SqlCommand cmdUpdate = new SqlCommand(
                            @"UPDATE media
                            SET content.write (@data, NULL, NULL)
                            WHERE file_name = @content_disposition;", conn, trn);
                    cmdUpdate.Parameters.Add("@data", SqlDbType.VarBinary, -1);
                    cmdUpdate.Parameters.Add("@content_disposition", SqlDbType.VarChar, 256);
                    cmdUpdate.Parameters["@content_disposition"].Value = fileName;

                    using (Stream uploadStream = new BufferedStream(
                        new SqlStreamUpload
                        {
                            InsertCommand = cmdInsert,
                            UpdateCommand = cmdUpdate,
                            InsertDataParam = cmdInsert.Parameters["@data"],
                            UpdateDataParam = cmdUpdate.Parameters["@data"]
                        }, 8040))
                    {
                        file.InputStream.CopyTo(uploadStream);
                    }
                    trn.Commit();
                }
            }
        }

        public bool GetFileByName(string fileName, out FileDownloadModel file)
        {
            SqlConnection conn = GetConnection();
            try
            {
                SqlCommand cmd = new SqlCommand(
                    @"SELECT file_name,
	                    content_type,
                        content_coding,
                        DATALENGTH (content) as content_length,
	                    content
                    FROM media
                    WHERE file_name = @fileName;", conn);
                SqlParameter paramFilename = new SqlParameter(
                      @"fileName", SqlDbType.VarChar, 256);
                paramFilename.Value = fileName;
                cmd.Parameters.Add(paramFilename);
                SqlDataReader reader = cmd.ExecuteReader(
                    CommandBehavior.SequentialAccess |
                    CommandBehavior.SingleResult |
                    CommandBehavior.SingleRow |
                    CommandBehavior.CloseConnection);
                if (false == reader.Read())
                {
                    reader.Dispose();
                    conn = null;
                    file = null;
                    return false;
                }

                string contentDisposition = reader.GetString(0);
                string contentType = reader.GetString(1);
                string contentCoding = reader.IsDBNull(2) ? null : reader.GetString(2);
                long contentLength = reader.GetInt64(3);
                Stream content = new SqlReaderStream(reader, 4);

                file = new FileDownloadModel
                {
                    FileName = contentDisposition,
                    ContentCoding = contentCoding,
                    ContentType = contentType,
                    ContentLength = contentLength,
                    Content = content
                };
                conn = null; // ownership transfered to the reader/stream
                return true;
            }
            finally
            {
                if (null != conn)
                {
                    conn.Dispose();
                }
            }
        }
    }

Typically MVC applications tend to use a LINQ based Repository. In this case though I could not leverage LINQ because of the peculiar requirements of implementing efficient streaming for large BLOBs. So this Repository uses plain vanilla SqlClient code.

The GetFileByName method gets the one row from the media table and returns a FileDownloadModel with a SqlReaderStream object that is wrapping the SELECT command result. Note that I could not deploy the typical “using” pattern for the disposable SqlConnection because the connection must remain open until the command result SqlDataReader finishes streaming in the BLOB, and the streaming will be controlled by the MVC framework executing our ActionResult after the GetFileByName is completed. The connection will be closed when the SqlReaderStream is disposed, because of the CommandBehavior.CloseConnection flag. The stream will be disposed by the FileStreamResult.WriteFile method.

The PostFile method creates two SqlCommand statements, one to insert the first chunk of data along with the other relevant columns, and an update command that uses BLOB partial updates to write the subsequent chunks. A SqlStreamUpload object is then using these two SqlCommand to efficiently stream in the uploaded file. The intermediate BufferedStream is used to create upload chunks of the critical size of 8040 bytes (see above). If the content would have to be compressed, this is where it would happen, a GZipStream would be placed in front of the Bufferedstream in order to compress the uploaded file, and an the “@content_coding” parameter would have to be set as to “gzip”.

HTTP caching

I have left out from this implementation appropriate HTTP caching control. HTTP caching is extremely important in high volume traffic sites, there is no better way to optimize an HTTP request processing that to never receive said request at all and have the user-agent cache or some intermediate proxy serve a response instead. Our application would have to add appropriate ETag and Cache-Control HTTP headers and the MediaController would need to have an action for the HEAD requests. The IMediaRepository interface would need a new method to get all the file properties without the actual content. For the moment, I’ll leave these as an exercise for the reader…

To BLOB or Not to BLOB

This article does not try to address the fundamental question whether you should store the images in the database to start with. Arguments can be made both for and against this. Russel Sears, Catherine van Ingen and Jim Gray have published a research paper in 2006 on their performance comparison analysis between storing files in the database and storing them in the file system: To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?. They concluded that:

The study indicates that if objects are larger than one megabyte on average, NTFS has a clear advantage over SQL Server. If the objects are under 256 kilobytes, the database has a clear advantage. Inside this range, it depends on how write intensive the workload is, and the storage age of a typical replica in the system.

However their study was not comparing how to serve the files as HTTP responses. The fact that the web server can efficiently serve a file directly from the file system without any code being run in the web application changes the equation quite a bit, and tilts the performance strongly in favor of the file system. But if you already have considered the pros and cons and decided that the advantages of a consistent backup/restore and strong referential integrity warrants database stored BLOBs, then I hope this article highlights an efficient way to return those BLOBs as HTTP responses.

SqlDependency based caching of LINQ Queries

August 4th, 2010

Query Notifications is the SQL Server feature that allows a client to subscribe to notifications that are sent when data in the database changes irrelevant of how that change occurs. I have talked before about how Query Notifications works in the article The Mysterious Notification. This feature was designed specifically for client side cache invalidation: the applications runs a query, gets back the result and stores it in the cache. Whenever the result is changed because data was updated, the application will be notified and it can invalidate the cached result.

Leveraging Query Notifications from the managed clients is very easy due to the dedicated SqlDependency class that takes care of a lot of the details needed to be set up in place in order to be able to receive these notifications. But the MSDN examples and the general community know how with SqlDepenendency is geared toward straight forward usage, by attaching it to a SqlCommand object.

Leveraging SqlDependency from LINQ queries

There is no clear guidance from MSDN on how to mix these two technologies: Query Notifications and LINQ. There are a few in the community who have given hints on what has to be done, like this article Using SQLDependency objects with LINQ by Ryan Dunn (blog|twitter).

My goal is to propose an easy to use extension method that can add SqlDependency based caching to any IQueryable<T>. Usage should be as simple as:

var queryTags = from t in ctx.Tags select t;
var tags = queryTags.AsCached("Tags");
foreach (Tag t in tags)
{
  ...
}

The first invocation should run the query and return the result, setting up a SqlDependency notification and also caching the result. Subsequent invocations should return the cached result, without hitting the database. Any change to the Tags table in my example should trigger the SqlDependency and invalidate the cache. Next invocation would again run the query and return the updated result, setting up a new SqlDependency notification and caching the new result.

LinqToCache project

My solution is available as the LinqToCache project. To cache a LINQ query results and get active SqlDependency notifications when the data was changed, simply download the appropriate DLL for your target framework (.Net 3.5 or .Net 4.0) and add it as a reference to your project. Now any LINQ query (any IQueryable) will have a new extension method AsCached. This method returns an IEnumerable of the query result. First invocation will always hit the database and set up a SqlDependency, subsequent invocations will return the cached result as long as it was not invalidated.

Query Notifications restrictions

Not every query can be subscribed for notifications. The gory details of what works and what doesn’t are described in MSDN at Creating a Query for Notification:

  • The projected columns in the SELECT statement must be explicitly stated, and table names must be qualified with two-part names. Notice that this means that all tables referenced in the statement must be in the same database.
  • The statement may not use the asterisk (*) or table_name.* syntax to specify columns.
  • The statement may not use unnamed columns or duplicate column names.
  • The statement must reference a base table.
  • The statement must not reference tables with computed columns.
  • The projected columns in the SELECT statement may not contain aggregate expressions unless the statement uses a GROUP BY expression. When a GROUP BY expression is provided, the select list may contain the aggregate functions COUNT_BIG() or SUM(). However, SUM() may not be specified for a nullable column. The statement may not specify HAVING, CUBE, or ROLLUP.
  • A projected column in the SELECT statement that is used as a simple expression must not appear more than once.
  • The statement must not include PIVOT or UNPIVOT operators.
  • The statement must not include the UNION, INTERSECT, or EXCEPT operators.
  • The statement must not reference a view.
  • The statement must not contain any of the following: DISTINCT, COMPUTE or COMPUTE BY, or INTO.
  • The statement must not reference server global variables (@@variable_name).
  • The statement must not reference derived tables, temporary tables, or table variables.
  • The statement must not reference tables or views from other databases or servers.
  • The statement must not contain subqueries, outer joins, or self-joins.
  • The statement must not reference the large object types: text, ntext, and image.
  • The statement must not use the CONTAINS or FREETEXT full-text predicates.
  • The statement must not use rowset functions, including OPENROWSET and OPENQUERY.
  • The statement must not use any of the following aggregate functions: AVG, COUNT(*), MAX, MIN, STDEV, STDEVP, VAR, or VARP.
  • The statement must not use any nondeterministic functions, including ranking and windowing functions.
  • The statement must not contain user-defined aggregates.
  • The statement must not reference system tables or views, including catalog views and dynamic management views.
  • The statement must not include FOR BROWSE information.
  • The statement must not reference a queue.
  • The statement must not contain conditional statements that cannot change and cannot return results (for example, WHERE 1=0).
  • The statement can not specify READPAST locking hint.
  • The statement must not reference any Service Broker QUEUE.
  • The statement must not reference synonyms.
  • The statement must not have comparison or expression based on double/real data types.
  • The statement must not use the TOP expression.

Although this list of restrictions is pretty severe, there is still room left for plenty of useful queries than can be cached using SqlDependency notifications for invalidation.

Linq to SQL

Straight forward LINQ to SQL queries are valid for Query Notifications, as long as the first restriction listed above is cleared: table names must be qualified with two-part names. In practice, this means simply fully qualifying the table names in the context designer, or in the [Table] attribute on the class. that is, always use ‘dbo.Table’ instead of simply ‘Table’ (of course, replace ‘dbo’ with appropriate schema if necessary).

But there are a couple of conditions that are specially important for us: must not use the TOP expression and must not use … ranking and windowing functions.. These two restrictions mean the popular Skip() and Take() operators are not supported. Unfortunately, these are some of the most popular operators used with LINQ because they are the easiest way to implement paging of results.

LINQ to Entity Framework

My initial goal was to only support LINQ to SQL, given that the overwhelming majority of developers favor it over EF. But the implementation works with any IQueryable, so in theory it should just work with EF as well. Unfortunately, the way EF chooses to formulate the queries makes it incompatible with Query Notifications. Consider a simple Linq TO EF query like following:

var q = from p in ctx.Persons where p.FirstName == "Remus" select p;

This will generate the following SQL:

SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[LastName] AS [LastName]
FROM (SELECT
      [Persons].[PersonId] AS [PersonId],
      [Persons].[FirstName] AS [FirstName],
      [Persons].[LastName] AS [LastName]
      FROM [dbo].[Persons] AS [Persons]) AS [Extent1]
WHERE 'Remus' = [Extent1].[FirstName]

The gratuitous addition of a subquery violates the Query Notifications restrictions and the SqlDependency gets invalidated straight away with a Statement violation.

Download

The LinqToCache DLLs and source code are available at http://code.google.com/p/linqtocache/.

Using tables as Queues

March 26th, 2010

A very common question asked on all programming forums is how to implement queues based on database tables. This is not a trivial question actually. Implementing a queue backed by a table is notoriously difficult, error prone and susceptible to deadlocks. Because queues are usually needed as a link between various processing stages in a workflow they operate in highly concurrent environments where multiple processes enqueue rows into the table while multiple processes attempt to dequeue these rows. This concurrency creates correctness, scalability and performance challenges.

But since SQL Server 2005 introduced the OUTPUT clause, using tables as queues is no longer a hard problem. This fact is called out in the OUTPUT Clause topic in BOL:

You can use OUTPUT in applications that use tables as queues, or to hold intermediate result sets. That is, the application is constantly adding or removing rows from the table… Other semantics may also be implemented, such as using a table to implement a stack.

The reason why OUTPUT clause is critical is that it offers an atomic destructive read operation that allows us to remove the dequeued row and return it to the caller, in one single statement.

Heap Queues

The simplest queue is a heap: producers can equeue into the heap and consumers can dequeue, but order of operations is not important: the consumers can dequeue any row, as long as it is unlocked.

create table HeapQueue (
  Payload varbinary(max));
go

create procedure usp_enqueueHeap
  @payload varbinary(max)
as
  set nocount on;
  insert into HeapQueue (Payload) values (@Payload);
go

create procedure usp_dequeueHeap 
as
  set nocount on;
  delete top(1) from HeapQueue with (rowlock, readpast)
      output deleted.payload;      
go

A heap queue can satisfy most producer-consumer patterns. It scales well and is very simple to implement and understand. Notice the (rowlock, readpast) hints on the delete operation, they allow for concurrent consumers to dequeue rows from the table without blocking each other. A heap queue makes no order guarantees.

FIFO Queues

When he queueing and dequeuing operations have to support a certain order two changes have to be made:

  • The table must be organized as a clustered index ordered by a key that preserves the desired dequeue order.
  • The dequeue operation must contain an ORDER BY clause to guarantee the order.

create table FifoQueue (
  Id bigint not null identity(1,1),
  Payload varbinary(max));
go

create clustered index cdxFifoQueue on FifoQueue (Id);
go

create procedure usp_enqueueFifo
  @payload varbinary(max)
as
  set nocount on;
  insert into FifoQueue (Payload) values (@Payload);
go

create procedure usp_dequeueFifo
as
  set nocount on;
  with cte as (
    select top(1) Payload
      from FifoQueue with (rowlock, readpast)
    order by Id)
  delete from cte
    output deleted.Payload;
go

By adding the IDENTITY column to our queue and making it the clustered index, we can dequeue in the order inserted. The enqueue operation is identical with our Heap Queue, but the dequeue is slightly changed, as the requirement to dequeue in the order inserted means that we have to specify an ORDER BY. Since the DELETE statement does not support ORDER BY, we use a Common Table Expression to select the row to be dequeued, then delete this row and return the payload in the OUTPUT clause. Isn’t this the same as doing a SELECT followed by a DELETE, and hence exposed to the traditional correctness problems with table backed queues? Technically, it is. But this is a SELECT followed by a DELETE that actually works for table based queues. Let me explain.

Because the query is actually an DELETE of a CTE, the query execution will occur as a DELETE, not as an SELECT followed by a DELETE, and also not as a SELECT executed in the context of the DELETE. The crucial part is that the SELECT part will aquire LCK_M_U update locks on the rows scanned. LCK_M_U is compatible with LCK_M_S shared locks, but is incompatible with another LCK_M_U. So two concurrent dequeue threads will not try to dequeue the same row. One will grab the first row free, the other thread will grab the next row.

Is also worth looking at how a compact plan the usp_dequeueFifo has:

usp_dequeueFifo execution plan

usp_dequeueFifo execution plan

Compare this with the alternative of using a subquery to locate the row to be deleted:

Subquery deque plan

Subquery deque plan


delete top(1) from FifoQueue
output deleted.Payload
where Id = (
select top(1) Id
  from FifoQueue with (rowlock, updlock, readpast)
order by Id)

Strict Ordering and Concurrency

Strict FIFO ordering in a database world would have to take into account transactions. If transaction T1 dequeues row A, transaction T2 dequeues the next row B and then T1 rolls back and T2 commits, the row B was processed out of order. So any dequeue operation would have to wait for the previous dequeue to committ before proceeding. While this is correct, is also highly inefficient, as it means that all transactions must serialize access to the queue. Many applications accept the processing to occur out of order for the sake of achieving a reasonable scalability and performance.

If strict FIFO order is required then you have to remove the readpast hint from the usp_dequeueFifo procedure. When this is done, only one transaction can dequeue rows from the queue at a time. All other transaction will have to wait until the first one commits. This is not an implementation artifact, it is a fundamental requirement derived from the ACID properties of transactions.

If a lax FIFO order is acceptable, then the readpast hint will ensure that multiple transactions can dequeue rows concurrently. However, the strict FIFO order cannot be guaranteed in this case.

LIFO Stacks

A stack backed by a queue is also possible. The implementation and table structure is almost identical with the FIFO queue, with one difference: the ORDER BY clause has a DESC.

with cte as (
    select top(1) Payload
      from FifoQueue with (rowlock, readpast)
    order by Id DESC)
  delete from cte
    output deleted.Payload;

Because all operations (enqueue and dequeue) occur on the same rows, stacks implemented as tables tend to create a very hot spot on the page which currently contains these rows. Because all row insert, delete and update operations need to lock the page latch exclusively and stacks operate on the rows grouped at one end of the table, the result is high page latch contention on this page. Queues have the same problem, but to a lesser extent as the operations are spread in inserts at one end of the table and deletes at the other end, so the same number of operations is split into two hot spots instead of a single one like in stacks case.

Pending Queues

Another category of queues are pending queues. Items are inserted with a due date, and the dequeue operation returns rows that are due at dequeue time. This type of queues is common in scheduling systems.

create table PendingQueue (
  DueTime datetime not null,
  Payload varbinary(max));

create clustered index cdxPendingQueue on PendingQueue (DueTime);
go

create procedure usp_enqueuePending
  @dueTime datetime,
  @payload varbinary(max)
as
  set nocount on;
  insert into PendingQueue (DueTime, Payload)
    values (@dueTime, @payload);
go

create procedure usp_dequeuePending
as
  set nocount on;
  declare @now datetime;
  set @now = getutcdate();
  with cte as (
    select top(1) 
      Payload
    from PendingQueue with (rowlock, readpast)
    where DueTime < @now
    order by DueTime)
  delete from cte
    output deleted.Payload;
go

I choose to use UTC times for my example, and I highly recommend you do the same for your applications. Not only this eliminates the problem of having to deal with timezones, but also your pending operations will behave correctly on the two times each year when summer time enters into effect or when it ends.

Why not use built-in Queues?

SQL Server has Queues, right? After all, what else are statements like CREATE QUEUE and DROP QUEUE refer to? Well, not really. SQL Server includes Service Broker, it’s true, and Service Broker uses these message stores that, for lack of a better term, where called Queues. But even during product development serious consideration was given to whether a different term should be used instead of ‘queue’ (eg. use ‘message store’). In the end the decision was made to use the term ‘queue’ given the industry familiarity with the term. But make no mistake, Service Broker ‘queues’ are not a generic queue storage for anyone to use. They are intended solely as a store of messages for Service Broker. If you try to use them as a generic queue, several shortcomings will become immediately apparent:

  • Difficulty to enqueue. With Service Broker Queues you need to begin a conversation and send a message on it in order to enqueue something into a queue. What is a conversation you ask? My point exactly: the semantics exposed by Service Broker are those needed for it’s purpose, namely reliable messaging. You do not need to learn about services, contracts, message types, routes and remote service bindings just to enqueue a row into a queue.
  • Fixed structure. Service Broker Queues have a specific table structure that cannot be altered in any fashion. You cannot add, alter or drop columns, you cannot change the indexes, you cannot change the clustered index. The Service Broker queues schema is designed for the RECEIVE verb and for the conversation group locking semantics of Service Broker, but that schema may not be what is optimal for your case.
  • Lack of maintenance options. I blogged about this issue in my article Dealing with Large Queues. With Service Broker Queues you cannot use any of the table maintenance DDL, like rebuilding or reorganizing an index, you cannot use DMVs like sys.dm_db_index_physical_stats nor can you change the various table options via sp_tableoptions.

However Service Broker has one ace up its selves: Activation. Queue processing is often associated with event driven programming and the possibility to launch a procedure to handle incoming rows as they are enqueued in is always required with queues. Triggers don’t work as processing has to occur after the enqueue is committed. And scheduled SQL Agent jobs don’t adapt to the variable rates and spikes queue experience: if they are too aggressive they’ll burn CPU, but if they are too passive the latency increases even under no load. Is hard enough to tune it for a sweet spot under a constant load, but add a variable load with spikes and the task becomes impossible. Unfortunately there is no substitute for Activation, you have to handle the processing as a separate tasks that polls the queue for new rows. The only way to leverage activation, with it’s sweet mix of non-polling and self load balancing, is to use Service Broker Queues.

Performance comparison of varchar(max) vs. varchar(N)

March 22nd, 2010

The question of comparing the MAX types (VARCHAR, NVARCHAR, VARBINARY) with their non-max counterparts is often asked, but the answer usually gravitate around the storage differences. But I’d like to address the point that these types have inherent, intrinsic performance differences that are not driven by different storage characteristics. In other words, simply comparing and manipulating variables and columns in T-SQL can yield different performance when VARCHAR(MAX) is used vs. VARCHAR(N).

Assignment

First comparing simple assignment, assign a value to a VARBINARY(8000) variable in a tight loop:

declare @x varchar(8000);
declare @startTime datetime;
declare @i int;

set @i = 0;
set @startTime = getutcdate();

while @i < 1000000
begin
  set @x = 'abc';
  set @i = @i + 1;
end

select datediff(ms, @startTime, getutcdate());

This script runs on my test server in 4.9 seconds on average. Simply changing the variable declaration to VARBINARY(MAX) changes the run time to an average of of 9.2 seconds.

Comparison

Next I measured the performance of a simple comparison:

declare @x varchar(8000);
declare @startTime datetime;
declare @i int;

set @i = 0;
set @startTime = getutcdate();
set @x = 'abc';


while @i < 1000000
begin
  declare @y bit;
  set @y = case when @x = 'abc' then 1 else 0 end;
  set @i = @i + 1;
end

select datediff(ms, @startTime, getutcdate());

Average run time for VARCHAR(8000): 5.8 seconds. For VARCHAR(MAX): 6.5 seconds.

Data Access

The next text does a simple string comparison of a VARCHAR(MAX) field vs. a VARCHAR(8000) field. The actual value stored is identical in both cases, a simple 3 letter string ‘abc’. The data access does not retrieve the string value, it simply compares its value against a WHERE clause predicate:

create table test (
  id int identity(1,1) primary key, 
  x varchar(max));
go

insert into test(x) values ('abc');
go

declare @x varchar(max);
declare @startTime datetime;
declare @i int;

set nocount on;
set @i = 0;
set @startTime = getutcdate();

while @i < 100000
begin
  declare @id int;
  select @id = id 
    from test 
    where id = 1
    and x = 'abc';
  set @i = @i + 1;
end

select datediff(ms, @startTime, getutcdate());

Loop execution time for VARCHAR(8000): 3.1 seconds. For VARCHAR(MAX): 3.6 seconds.

Furthermore, if we change the WHERE predicate to compare against a max type variable instead of the literal ‘abc‘ the loop time increases to 3.9 seconds.

Conclusion

The code path that handles the MAX types (varchar, nvarchar and varbinary) is different from the code path that handles their equivalent non-max length types. The non-max types can internally be represented as an ordinary pointer-and-length structure. But the max types cannot be stored internally as a contiguous memory area, since they can possibly grow up to 2Gb. So they have to be represented by a streaming interface, similar to COM’s IStream. This carries over to every operation that involves the max types, including simple assignment and comparison, since these operations are more complicated over a streaming interface. The biggest impact is visible in the code that allocates and assign max-type variables (my first test), but the impact is visible on every operation.

The impact is reasonable on most operations with the max type operations being about 10% slower. So is this something to worry about, or even worth blogging about, after all? Actually, in a recent project of mine I had to change a column type from varchar(max) to varchar(5000) to alleviate the impact of max-types performance. The code happened to be on a very very performance critical path and testing clearly showed the impact was not only measurable, was quite actually quite serious. The processing throughput increased by about 25% just from this minor change in my case.

Although much discussion exists around the storage of max types vs. the storage of non-max types and the similarities and differences between them (eg. in-row vs. out-of-row vs. row-overflow storage), there is also a more fundamental aspect that differentiate these types: SQL Server internal code differences. These differences manifest themselves even when the storage of the max type is optimized in-row, for small and very small values of the BLOB field.

select count(*);

October 26th, 2009

Quick trivia: what is the result of running SELECT COUNT(*);?

That’s right, no FROM clause, just COUNT(*). The answer may be a little bit surprising, is 1. When you query SELECT 1; the result is, as expected, 1. And SELECT 2; will return 2. So SELECT COUNT(2); returns, as expected, 1, after all it counts how many rows are in the result set. But SELECT COUNT(*); has a certain smell of voo-doo to it. Ok, is the * project operator, but project from… what exactly? It feels eerie, like a count is materialized out of the blue.

How about SELECT COUNT(*) [MyTable]. Well, that’s actually just a shortcut for SELECT COUNT(*) AS [MyTable], so it still returns 1 but in a column named MyTable. Now you understand why my heart missed a bit when I checked how I initialized a replication subscription and I forgot to type in FROM