SqlDependency based caching of LINQ Queries

August 4th, 2010

Query Notifications is the SQL Server feature that allows a client to subscribe to notifications that are sent when data in the database changes irrelevant of how that change occurs. I have talked before about how Query Notifications works in the article The Mysterious Notification. This feature was designed specifically for client side cache invalidation: the applications runs a query, gets back the result and stores it in the cache. Whenever the result is changed because data was updated, the application will be notified and it can invalidate the cached result.

Leveraging Query Notifications from the managed clients is very easy due to the dedicated SqlDependency class that takes care of a lot of the details needed to be set up in place in order to be able to receive these notifications. But the MSDN examples and the general community know how with SqlDepenendency is geared toward straight forward usage, by attaching it to a SqlCommand object.

Leveraging SqlDependency from LINQ queries

There is no clear guidance from MSDN on how to mix these two technologies: Query Notifications and LINQ. There are a few in the community who have given hints on what has to be done, like this article Using SQLDependency objects with LINQ by Ryan Dunn (blog|twitter).

My goal is to propose an easy to use extension method that can add SqlDependency based caching to any IQueryable<T>. Usage should be as simple as:

var queryTags = from t in ctx.Tags select t;
var tags = queryTags.AsCached("Tags");
foreach (Tag t in tags)
{
  ...
}

The first invocation should run the query and return the result, setting up a SqlDependency notification and also caching the result. Subsequent invocations should return the cached result, without hitting the database. Any change to the Tags table in my example should trigger the SqlDependency and invalidate the cache. Next invocation would again run the query and return the updated result, setting up a new SqlDependency notification and caching the new result.

LinqToCache project

My solution is available as the LinqToCache project. To cache a LINQ query results and get active SqlDependency notifications when the data was changed, simply download the appropriate DLL for your target framework (.Net 3.5 or .Net 4.0) and add it as a reference to your project. Now any LINQ query (any IQueryable) will have a new extension method AsCached. This method returns an IEnumerable of the query result. First invocation will always hit the database and set up a SqlDependency, subsequent invocations will return the cached result as long as it was not invalidated.

Query Notifications restrictions

Not every query can be subscribed for notifications. The gory details of what works and what doesn’t are described in MSDN at Creating a Query for Notification:

  • The projected columns in the SELECT statement must be explicitly stated, and table names must be qualified with two-part names. Notice that this means that all tables referenced in the statement must be in the same database.
  • The statement may not use the asterisk (*) or table_name.* syntax to specify columns.
  • The statement may not use unnamed columns or duplicate column names.
  • The statement must reference a base table.
  • The statement must not reference tables with computed columns.
  • The projected columns in the SELECT statement may not contain aggregate expressions unless the statement uses a GROUP BY expression. When a GROUP BY expression is provided, the select list may contain the aggregate functions COUNT_BIG() or SUM(). However, SUM() may not be specified for a nullable column. The statement may not specify HAVING, CUBE, or ROLLUP.
  • A projected column in the SELECT statement that is used as a simple expression must not appear more than once.
  • The statement must not include PIVOT or UNPIVOT operators.
  • The statement must not include the UNION, INTERSECT, or EXCEPT operators.
  • The statement must not reference a view.
  • The statement must not contain any of the following: DISTINCT, COMPUTE or COMPUTE BY, or INTO.
  • The statement must not reference server global variables (@@variable_name).
  • The statement must not reference derived tables, temporary tables, or table variables.
  • The statement must not reference tables or views from other databases or servers.
  • The statement must not contain subqueries, outer joins, or self-joins.
  • The statement must not reference the large object types: text, ntext, and image.
  • The statement must not use the CONTAINS or FREETEXT full-text predicates.
  • The statement must not use rowset functions, including OPENROWSET and OPENQUERY.
  • The statement must not use any of the following aggregate functions: AVG, COUNT(*), MAX, MIN, STDEV, STDEVP, VAR, or VARP.
  • The statement must not use any nondeterministic functions, including ranking and windowing functions.
  • The statement must not contain user-defined aggregates.
  • The statement must not reference system tables or views, including catalog views and dynamic management views.
  • The statement must not include FOR BROWSE information.
  • The statement must not reference a queue.
  • The statement must not contain conditional statements that cannot change and cannot return results (for example, WHERE 1=0).
  • The statement can not specify READPAST locking hint.
  • The statement must not reference any Service Broker QUEUE.
  • The statement must not reference synonyms.
  • The statement must not have comparison or expression based on double/real data types.
  • The statement must not use the TOP expression.

Although this list of restrictions is pretty severe, there is still room left for plenty of useful queries than can be cached using SqlDependency notifications for invalidation.

Linq to SQL

Straight forward LINQ to SQL queries are valid for Query Notifications, as long as the first restriction listed above is cleared: table names must be qualified with two-part names. In practice, this means simply fully qualifying the table names in the context designer, or in the [Table] attribute on the class. that is, always use ‘dbo.Table’ instead of simply ‘Table’ (of course, replace ‘dbo’ with appropriate schema if necessary).

But there are a couple of conditions that are specially important for us: must not use the TOP expression and must not use … ranking and windowing functions.. These two restrictions mean the popular Skip() and Take() operators are not supported. Unfortunately, these are some of the most popular operators used with LINQ because they are the easiest way to implement paging of results.

LINQ to Entity Framework

My initial goal was to only support LINQ to SQL, given that the overwhelming majority of developers favor it over EF. But the implementation works with any IQueryable, so in theory it should just work with EF as well. Unfortunately, the way EF chooses to formulate the queries makes it incompatible with Query Notifications. Consider a simple Linq TO EF query like following:

var q = from p in ctx.Persons where p.FirstName == "Remus" select p;

This will generate the following SQL:

SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[LastName] AS [LastName]
FROM (SELECT
      [Persons].[PersonId] AS [PersonId],
      [Persons].[FirstName] AS [FirstName],
      [Persons].[LastName] AS [LastName]
      FROM [dbo].[Persons] AS [Persons]) AS [Extent1]
WHERE 'Remus' = [Extent1].[FirstName]

The gratuitous addition of a subquery violates the Query Notifications restrictions and the SqlDependency gets invalidated straight away with a Statement violation.

Download

The LinqToCache DLLs and source code are available at http://code.google.com/p/linqtocache/.

How to change database mirroring encryption with minimal downtime

April 23rd, 2010

SQL Server Database Mirroring uses an encrypted connection to ship the data between the principal and the mirror. By default RC4 encryption is enabled and used. The endpoint can be configured to use AES encryption instead, or no encryption. The overhead of using RC4 encryption is quite small, the overhead of using AES encryption is slightly bigger, but not significant. Under well understood conditions, like inside a secured data center, encryption can be safely turned off for a 5-10% increase in speed in mirroring traffic. Note that even with encryption turned off, the traffic is still cryptographically signed (HMAC). Traffic signing cannot be turned off.

To change the encryption used by an endpoint, one has to run the ALTER ENDPOINT … FOR DATABASE_MIRRORING (ENCRYPTION = {DISABLED|SUPPORTED|REQUIRED}). Two endpoints must have compatible encryption settings to be able to communicate. The following table shows the compatibility matrix of encryption settings:

DISABLED SUPPORTED REQUIRED
DISABLED CLEAR CLEAR -
SUPPORTED CLEAR ENCRYPTED ENCRYPTED
REQUIRED - ENCRYPTED ENCRYPTED

The default setting for an endpoint is ENCRYPTION = REQUIRED, which enforces encryption and refuses to connect to an endpoint that has disabled encryption.

Changing encryption settings on an existing endpoint

If you have a running mirroring session and want to change the settings to squeeze the extra 5-10% you can expect from removing RC4 encryption, then chances are you deployed the endpoint with the default encryption settings, namely REQUIRED. If you don’t know the current endpoint settings you can always check the sys.database_mirroring_endpoints metadata catalog. When encryption is REQUIRED the encryption_algorithm column is one of 1,2,5 or 6. When encryption is SUPPORTED the encryption_algorithm column is one of 3,4,7 or 8. When is DISABLED the encryption_algorithm is 0 and the is_encryption_Enabled column changes to 0. To force the traffic to be unencrypted at least one endpoint has to have ENCRYPTION = DISABLED.

When you run the ALTER ENDPOINT statement and change the encryption settings the endpoint is going to be stopped and restarted during the ALTER statement. All existing connections will be disconnected. A database mirroring session may immediately re-connect and not react to this short disruption in any fashion visible to the user.

The safest way to change an existing mirroring session that uses encryption to no longer encrypt the traffic, when there is no witness, would be like this:

  1. Change the mirror endpoint to SUPPORTED
  2. Change the principal endpoint to DISABLED
  3. Change the mirror endpoint to DISABLED
  4. Verify that the connections are unencrypted, check encryption_algorithm column in the sys.dm_db_mirroring_connections DMV.

If a the mirroring session involves a witness, then it too must have the endpoint set to a compatible encryption setting:

  1. Change the witness endpoint to SUPPORTED
  2. Change the mirror endpoint to SUPPORTED
  3. Change the principal endpoint to DISABLED
  4. Change the witness endpoint to DISABLED
  5. Change the mirror endpoint to DISABLED
  6. Verify that the connections are unencrypted, check encryption_algorithm column in the sys.dm_db_mirroring_connections DMV.

Note that if automatic failover is enabled then at the moment the principal endpoint is changed, it is possible for automatic failover to occur, given that for a brief moment the mirror and the witness will have a quorum.

How to troubleshoot if something goes wrong

Attach Profiler to all instances involved in the mirroring session and open a trace that listens for the Audit Database Mirroring Login Event Class (on SQL 2005 use the Audit Broker Login Event Class event instead, it will trace the DBM sessions). If you did a mistake during the ALTER ENDPOINT changes and ended up with incompatible settings, there will be an event generated visible in Profiler. The event Text will contain an error message explaining why the endpoints cannot connect.

Using tables as Queues

March 26th, 2010

A very common question asked on all programming forums is how to implement queues based on database tables. This is not a trivial question actually. Implementing a queue backed by a table is notoriously difficult, error prone and susceptible to deadlocks. Because queues are usually needed as a link between various processing stages in a workflow they operate in highly concurrent environments where multiple processes enqueue rows into the table while multiple processes attempt to dequeue these rows. This concurrency creates correctness, scalability and performance challenges.

But since SQL Server 2005 introduced the OUTPUT clause, using tables as queues is no longer a hard problem. This fact is called out in the OUTPUT Clause topic in BOL:

You can use OUTPUT in applications that use tables as queues, or to hold intermediate result sets. That is, the application is constantly adding or removing rows from the table… Other semantics may also be implemented, such as using a table to implement a stack.

The reason why OUTPUT clause is critical is that it offers an atomic destructive read operation that allows us to remove the dequeued row and return it to the caller, in one single statement.

Heap Queues

The simplest queue is a heap: producers can equeue into the heap and consumers can dequeue, but order of operations is not important: the consumers can dequeue any row, as long as it is unlocked.

create table HeapQueue (
  Payload varbinary(max));
go

create procedure usp_enqueueHeap
  @payload varbinary(max)
as
  set nocount on;
  insert into HeapQueue (Payload) values (@Payload);
go

create procedure usp_dequeueHeap 
as
  set nocount on;
  delete top(1) from HeapQueue with (rowlock, readpast)
      output deleted.payload;      
go

A heap queue can satisfy most producer-consumer patterns. It scales well and is very simple to implement and understand. Notice the (rowlock, readpast) hints on the delete operation, they allow for concurrent consumers to dequeue rows from the table without blocking each other. A heap queue makes no order guarantees.

FIFO Queues

When he queueing and dequeuing operations have to support a certain order two changes have to be made:

  • The table must be organized as a clustered index ordered by a key that preserves the desired dequeue order.
  • The dequeue operation must contain an ORDER BY clause to guarantee the order.

create table FifoQueue (
  Id bigint not null identity(1,1),
  Payload varbinary(max));
go

create clustered index cdxFifoQueue on FifoQueue (Id);
go

create procedure usp_enqueueFifo
  @payload varbinary(max)
as
  set nocount on;
  insert into FifoQueue (Payload) values (@Payload);
go

create procedure usp_dequeueFifo
as
  set nocount on;
  with cte as (
    select top(1) Payload
      from FifoQueue with (rowlock, readpast)
    order by Id)
  delete from cte
    output deleted.Payload;
go

By adding the IDENTITY column to our queue and making it the clustered index, we can dequeue in the order inserted. The enqueue operation is identical with our Heap Queue, but the dequeue is slightly changed, as the requirement to dequeue in the order inserted means that we have to specify an ORDER BY. Since the DELETE statement does not support ORDER BY, we use a Common Table Expression to select the row to be dequeued, then delete this row and return the payload in the OUTPUT clause. Isn’t this the same as doing a SELECT followed by a DELETE, and hence exposed to the traditional correctness problems with table backed queues? Technically, it is. But this is a SELECT followed by a DELETE that actually works for table based queues. Let me explain.

Because the query is actually an DELETE of a CTE, the query execution will occur as a DELETE, not as an SELECT followed by a DELETE, and also not as a SELECT executed in the context of the DELETE. The crucial part is that the SELECT part will aquire LCK_M_U update locks on the rows scanned. LCK_M_U is compatible with LCK_M_S shared locks, but is incompatible with another LCK_M_U. So two concurrent dequeue threads will not try to dequeue the same row. One will grab the first row free, the other thread will grab the next row.

Is also worth looking at how a compact plan the usp_dequeueFifo has:

usp_dequeueFifo execution plan

usp_dequeueFifo execution plan

Compare this with the alternative of using a subquery to locate the row to be deleted:

Subquery deque plan

Subquery deque plan


delete top(1) from FifoQueue
output deleted.Payload
where Id = (
select top(1) Id
  from FifoQueue with (rowlock, updlock, readpast)
order by Id)

Strict Ordering and Concurrency

Strict FIFO ordering in a database world would have to take into account transactions. If transaction T1 dequeues row A, transaction T2 dequeues the next row B and then T1 rolls back and T2 commits, the row B was processed out of order. So any dequeue operation would have to wait for the previous dequeue to committ before proceeding. While this is correct, is also highly inefficient, as it means that all transactions must serialize access to the queue. Many applications accept the processing to occur out of order for the sake of achieving a reasonable scalability and performance.

If strict FIFO order is required then you have to remove the readpast hint from the usp_dequeueFifo procedure. When this is done, only one transaction can dequeue rows from the queue at a time. All other transaction will have to wait until the first one commits. This is not an implementation artifact, it is a fundamental requirement derived from the ACID properties of transactions.

If a lax FIFO order is acceptable, then the readpast hint will ensure that multiple transactions can dequeue rows concurrently. However, the strict FIFO order cannot be guaranteed in this case.

LIFO Stacks

A stack backed by a queue is also possible. The implementation and table structure is almost identical with the FIFO queue, with one difference: the ORDER BY clause has a DESC.

with cte as (
    select top(1) Payload
      from FifoQueue with (rowlock, readpast)
    order by Id DESC)
  delete from cte
    output deleted.Payload;

Because all operations (enqueue and dequeue) occur on the same rows, stacks implemented as tables tend to create a very hot spot on the page which currently contains these rows. Because all row insert, delete and update operations need to lock the page latch exclusively and stacks operate on the rows grouped at one end of the table, the result is high page latch contention on this page. Queues have the same problem, but to a lesser extent as the operations are spread in inserts at one end of the table and deletes at the other end, so the same number of operations is split into two hot spots instead of a single one like in stacks case.

Pending Queues

Another category of queues are pending queues. Items are inserted with a due date, and the dequeue operation returns rows that are due at dequeue time. This type of queues is common in scheduling systems.

create table PendingQueue (
  DueTime datetime not null,
  Payload varbinary(max));

create clustered index cdxPendingQueue on PendingQueue (DueTime);
go

create procedure usp_enqueuePending
  @dueTime datetime,
  @payload varbinary(max)
as
  set nocount on;
  insert into PendingQueue (DueTime, Payload)
    values (@dueTime, @payload);
go

create procedure usp_dequeuePending
as
  set nocount on;
  declare @now datetime;
  set @now = getutcdate();
  with cte as (
    select top(1) 
      Payload
    from PendingQueue with (rowlock, readpast)
    where DueTime < @now
    order by DueTime)
  delete from cte
    output deleted.Payload;
go

I choose to use UTC times for my example, and I highly recommend you do the same for your applications. Not only this eliminates the problem of having to deal with timezones, but also your pending operations will behave correctly on the two times each year when summer time enters into effect or when it ends.

Why not use built-in Queues?

SQL Server has Queues, right? After all, what else are statements like CREATE QUEUE and DROP QUEUE refer to? Well, not really. SQL Server includes Service Broker, it’s true, and Service Broker uses these message stores that, for lack of a better term, where called Queues. But even during product development serious consideration was given to whether a different term should be used instead of ‘queue’ (eg. use ‘message store’). In the end the decision was made to use the term ‘queue’ given the industry familiarity with the term. But make no mistake, Service Broker ‘queues’ are not a generic queue storage for anyone to use. They are intended solely as a store of messages for Service Broker. If you try to use them as a generic queue, several shortcomings will become immediately apparent:

  • Difficulty to enqueue. With Service Broker Queues you need to begin a conversation and send a message on it in order to enqueue something into a queue. What is a conversation you ask? My point exactly: the semantics exposed by Service Broker are those needed for it’s purpose, namely reliable messaging. You do not need to learn about services, contracts, message types, routes and remote service bindings just to enqueue a row into a queue.
  • Fixed structure. Service Broker Queues have a specific table structure that cannot be altered in any fashion. You cannot add, alter or drop columns, you cannot change the indexes, you cannot change the clustered index. The Service Broker queues schema is designed for the RECEIVE verb and for the conversation group locking semantics of Service Broker, but that schema may not be what is optimal for your case.
  • Lack of maintenance options. I blogged about this issue in my article Dealing with Large Queues. With Service Broker Queues you cannot use any of the table maintenance DDL, like rebuilding or reorganizing an index, you cannot use DMVs like sys.dm_db_index_physical_stats nor can you change the various table options via sp_tableoptions.

However Service Broker has one ace up its selves: Activation. Queue processing is often associated with event driven programming and the possibility to launch a procedure to handle incoming rows as they are enqueued in is always required with queues. Triggers don’t work as processing has to occur after the enqueue is committed. And scheduled SQL Agent jobs don’t adapt to the variable rates and spikes queue experience: if they are too aggressive they’ll burn CPU, but if they are too passive the latency increases even under no load. Is hard enough to tune it for a sweet spot under a constant load, but add a variable load with spikes and the task becomes impossible. Unfortunately there is no substitute for Activation, you have to handle the processing as a separate tasks that polls the queue for new rows. The only way to leverage activation, with it’s sweet mix of non-polling and self load balancing, is to use Service Broker Queues.

Performance comparison of varchar(max) vs. varchar(N)

March 22nd, 2010

The question of comparing the MAX types (VARCHAR, NVARCHAR, VARBINARY) with their non-max counterparts is often asked, but the answer usually gravitate around the storage differences. But I’d like to address the point that these types have inherent, intrinsic performance differences that are not driven by different storage characteristics. In other words, simply comparing and manipulating variables and columns in T-SQL can yield different performance when VARCHAR(MAX) is used vs. VARCHAR(N).

Assignment

First comparing simple assignment, assign a value to a VARBINARY(8000) variable in a tight loop:

declare @x varchar(8000);
declare @startTime datetime;
declare @i int;

set @i = 0;
set @startTime = getutcdate();

while @i < 1000000
begin
  set @x = 'abc';
  set @i = @i + 1;
end

select datediff(ms, @startTime, getutcdate());

This script runs on my test server in 4.9 seconds on average. Simply changing the variable declaration to VARBINARY(MAX) changes the run time to an average of of 9.2 seconds.

Comparison

Next I measured the performance of a simple comparison:

declare @x varchar(8000);
declare @startTime datetime;
declare @i int;

set @i = 0;
set @startTime = getutcdate();
set @x = 'abc';


while @i < 1000000
begin
  declare @y bit;
  set @y = case when @x = 'abc' then 1 else 0 end;
  set @i = @i + 1;
end

select datediff(ms, @startTime, getutcdate());

Average run time for VARCHAR(8000): 5.8 seconds. For VARCHAR(MAX): 6.5 seconds.

Data Access

The next text does a simple string comparison of a VARCHAR(MAX) field vs. a VARCHAR(8000) field. The actual value stored is identical in both cases, a simple 3 letter string ‘abc’. The data access does not retrieve the string value, it simply compares its value against a WHERE clause predicate:

create table test (
  id int identity(1,1) primary key, 
  x varchar(max));
go

insert into test(x) values ('abc');
go

declare @x varchar(max);
declare @startTime datetime;
declare @i int;

set nocount on;
set @i = 0;
set @startTime = getutcdate();

while @i < 100000
begin
  declare @id int;
  select @id = id 
    from test 
    where id = 1
    and x = 'abc';
  set @i = @i + 1;
end

select datediff(ms, @startTime, getutcdate());

Loop execution time for VARCHAR(8000): 3.1 seconds. For VARCHAR(MAX): 3.6 seconds.

Furthermore, if we change the WHERE predicate to compare against a max type variable instead of the literal ‘abc‘ the loop time increases to 3.9 seconds.

Conclusion

The code path that handles the MAX types (varchar, nvarchar and varbinary) is different from the code path that handles their equivalent non-max length types. The non-max types can internally be represented as an ordinary pointer-and-length structure. But the max types cannot be stored internally as a contiguous memory area, since they can possibly grow up to 2Gb. So they have to be represented by a streaming interface, similar to COM’s IStream. This carries over to every operation that involves the max types, including simple assignment and comparison, since these operations are more complicated over a streaming interface. The biggest impact is visible in the code that allocates and assign max-type variables (my first test), but the impact is visible on every operation.

The impact is reasonable on most operations with the max type operations being about 10% slower. So is this something to worry about, or even worth blogging about, after all? Actually, in a recent project of mine I had to change a column type from varchar(max) to varchar(5000) to alleviate the impact of max-types performance. The code happened to be on a very very performance critical path and testing clearly showed the impact was not only measurable, was quite actually quite serious. The processing throughput increased by about 25% just from this minor change in my case.

Although much discussion exists around the storage of max types vs. the storage of non-max types and the similarities and differences between them (eg. in-row vs. out-of-row vs. row-overflow storage), there is also a more fundamental aspect that differentiate these types: SQL Server internal code differences. These differences manifest themselves even when the storage of the max type is optimized in-row, for small and very small values of the BLOB field.

Dealing with Large Queues

March 9th, 2010

On a project I’m currently involved with we have to handle a constant influx of audit messages for processing. The messages come from about 50 SQL Express instances located in data centers around the globe, delivered via Service Broker into a processing queue hosted on a mirrored database where an activated procedure shreds the message payload into relational tables. These tables are in turn replicated with transactional replication into a data warehouse database. after that the messages are deleted from the processing servers, as replication is set up not to replicate deletes. The system must handle a constant average rate of about 200 messages per second, 24×7, with spikes going up to 2000-3000 messages per second over periods of minutes to an hour.

When dealing with these relatively high volumes, it is inevitable that queues will grow during the 2000 msgs/sec spikes and drain back to empty when the incoming rate stabilizes again at the normal 200 msgs/sec rate. Service Broker does an excellent job at handling these non-connectivity periods, retains the audit messages and quickly delivers them when connectivity is restored.

What I noticed though is that sometimes the processing of the received messages could hit a threshold from where it could not recover. The queue processing would slow down to a rate that was bellow the incoming rate, and from that point forward the queue could just grow. I want to detail a bit the reason why this can happen and what I did to alleviate the problem.

Index Fragmentation

Every DBA knows about fragmentation. All database developers also understand fragmentation and how to avoid it. So we can skip ahead and … wait. Actually, what is index fragmentation? Lets go back to the whitepaper Microsoft SQL Server 2000 Index Defragmentation Best Practices. Even though the whitepaper is for SQL 2000, it was recently updated on March 2009 and is the most detailed whitepaper dealing with index fragmentation released by Microsoft I know of:

Fragmentation
Fragmentation exists when indexes have pages in which the logical ordering, based on the key value, does not match the physical ordering inside the data file.

Say you have an index with 9 rows, with the keys A, B, C, D, E, F, G, H and J. For our example, each page can fit 3 rows, and the database pages are in order in the database file: P1, P2 and P3. If the rows are (A,B,C) on P1, then (E,F,G) on P2 and (G, H, J) on P3 then the index is unfragmented. But if the row are (E,F,G) on P1 then (G,H,J) on P2 and (A,B,C) on P3 the page P3 is first in the key order, but last in the physical file order.

Index fragmentation affects the read performance when fetching pages into the buffer pool. This is because the Read-Ahead Manager issues read-ahead requests in contiguous fragments. When the index is fragmented the read-ahead manager will issue a large number of small read-aheads. When the index is contiguous the Read-Ahead Manager will issue a small number of large read-aheads. On traditional disk-head and spindle drives, the large number of small read requests caused by fragmentation results in drastic read throughput reduction. A large number of IO requests carries also a bigger user-to-kernel context switch baggage. Again, this is explained in the whitepaper I linked above:

To understand why fragmentation had such an effect on the DSS workload performance, it is important to understand how fragmentation affects the SQL Server read-ahead manager. For queries that scan one or more indexes, the SQL Server read-ahead manager is responsible for scanning ahead through the index pages and bringing additional data pages into the SQL Server data cache. The read-ahead manager dynamically adjusts the size of reads it performs based on the physical ordering of the underlying pages. When there is low fragmentation, the read-ahead manager can read larger blocks of data at a time, more efficiently using the I/O subsystem. As the data becomes fragmented, the read-ahead manager must read smaller blocks of data. The amount of read-aheads that can be issued is independent of the physical ordering of the data; however, smaller read requests take more CPU resources per block, resulting in less overall disk throughput.

What causes index fragmentation?

Fragmentation occurs either when the order of physical operations does not match the logical order of rows (eg. insert rows into a table in reverse order of the clustered index key) or when frequent update operations occur after the index is constructed: rows are deleted and new rows are inserted. One case is particularly aggravating: the insert page split, because the page split not only increases fragmentation, it also reduces the page fill factor of the index, further damaging performance.

Are Queues Fragmented?

Service Broker queues are backed by internal tables, and these tables have a clustered index on (status, conversation_group_id, priority, conversation_handle, queueing_order). Messages are constantly enqueued and dequeued into and from this internal table. The queueing operations are nothing else than inserts and deletes, and according to what we just discussed about how fragmentation occurs, they should get high index fragmentation.

However, there is one crucial difference between how data tables are used, in comparison with queues: Queues drain, meaning they are always near 0 records count. When they grow during spikes, the processing must be able to drain them back to 0, even as new messages are enqueued at normal rates. A table with no rows has no pages, so there is no fragmentation. So the expected behavior of queues is to hover around a low record count, where fragmentation has no performance impact, grow on spikes, get fragmented, but drain back to 0 and thus repairing themselves ‘on the fly’.

Or at least that’s how the theory goes…

Ghost Cleanup

SQL Server DELETE operations do no remove rows from indexes. Instead, the rows are simply marked as ‘ghosted’ and left in the page. It is the job of a dedicated task, namely the Ghost Cleanup task, to reclaim the space these rows occupy. More importantly, it is the Ghost Cleanup task’s job to deallocate any page that no longer contains any record and release these pages back to the database, as free pages. This is a performance improvement because DELETE operations can complete faster, and it also improves the performance of rollbacks. If you want to read more about how Ghost Cleanup operates, the best resource is Paul Randal’s article Inside the Storage Engine: Ghost cleanup in depth.

As I said, dequeue operations are in fact DELETE from the internal tables that back the Service Broker queues. When the RECEIVE statement is run, in fact an DELETE with OUTPUT occurs on this internal table. And as such, all the returned messages are in fact ghosted records left in the internal tables. The Ghost Cleanup has to come about and collect them, to reclaim the space and eventually free the space.

The Ghost Cleanup is calibrated to operate on a normal data table environment. It wakes every 5 seconds, reclaims ghosted records in up to 10 pages, then goes to sleep again. In addition, SELECT statements that encounter ghosted records during index scans place the page into a list so that the Ghost Cleanup collects it on its next pass. From SQL Server testing and customer feedback, this calibration balances the need to cleanup pages with the overhead of running the ghost cleanup just fine for a normal data table.

The problem with Service Broker queues is that they are not normal data tables: they are queues! Every single row is inserted and then deleted as fast as possible. No record is ever read twice. No record stays around. The queues are constantly growing, and constantly shrinking. On machines that are entirely dedicated to Service Broker processing it is not unusual to have all CPU and all I/O resources of the server dedicated to enqueuing and dequeuing messages into one single queue. In other words INSERT then quickly DELETE one row at a time, from all CPU cores, as fast as the I/O subsystem permits it. The Ghost Cleanup better keep up, as every enqueued message is deleted, and every deleted row is a ghosted record to be cleanup up. All of the sudden 10 pages every 5 seconds seems a bit short changed, when the rate of newly created ghosted records is 200 per second!

Crossing the Threshold

In my project we had an incident that causes a massive queue growth, to about 19 million messages. We expected the system to start draining as it usually did before, but it never did. It kept growing at a rate about 3 million a day, indicating that the processing could not keep up with the incoming rate of 200 msgs/sec. The processing was running as fast as possible, on a highly optimized procedure using the fastest set oriented message processing, similar to what I recommend in Writing Service Broker Procedures. After trying to speed up the IO system, moving the drives to fastest LUNs available in the attached SAN, the system could still no keep up. The disk metrics showed a lot of read requests of 8192. bytes. On a SQL Server disk a large number of read requests of 8k size are a tell-tale of fragmentation: no multi-page read-ahead occurs, indicating that the average contiguous fragment length is 1 page. Under normal circumstances one would check sys.dm_db_index_physical_stats for fragmentation but there is a small gotcha: this DMV does not show stats for queues!

A second observation occurred sometimes later: even after the queue drained, the allocated space was not reclaimed by the ghost cleanup. In fact I’ve seen a queue having 0 rows, but over 1 million pages allocated. The Skipped Ghosted Records/sec performance counter was showing over 200000 ghosted records skipped per second. It seems that the Ghost Cleanup was just unable to keep up with the nearly 200Gb size empty queue. Even running DBCC FORCEGHOSTCLEANUP could not improve the situation.

Queue Maintenance

When a DBA is faced with a fragmented index, it has a simple avenue: rebuild or reorganize the index. ALTER INDEX … REORGANIZE or ALTER INDEX … REBUILD, and maybe do it online to avoid system downtime. In fact there are quite a few scripts provided by the community for just such a task, like Michelle Ufford’s Index Defrag Script, and good DBAs always have their own, customized, flavor of index maintenance script in their tool belt.

But what about queues? There is no ALTER QUEUE … REBUILD nor ALTER QUEUE … REORGANIZE. How about the good ole’ (and now deprecated) DBCC DBREINDEX and DBCC INDEXDEFRAG? Nope, they don’t works on queues. But queues are backed by hidden tables, right? You can always find out the hidden table that backs a queue:

select it.name as internal_table_name, q.name as queue_name
from sys.internal_tables it
join sys.service_queues q on it.parent_object_id = q.object_id

This query returns the name of the internal table that backs each Service Broker queue in the database. Can we do our maintenance operations on them? ALTER INDEX ALL ON queue_messages_1003150619 REORGANIZE. Nope, no luck. DBCC DBREINDEX(‘queue_messages_1003150619′)? But if you check the schema on which the internal table resides, its actually sys. Could DBCC DBREINDEX(‘sys.queue_messages_1003150619′) work? Nope.

At this point I must use the deus ex machina: I know some internals of SQL Server from my Service Broker FTE days. One such information is this: statements run from the Dedicated Administrative Connection can have different binding rules. Is this the answer? YES:

Service Broker queues can be reindexed by running DBCC DBREINDEX on the internal table that backs the queue from the DAC connection. The internal table name must be prefixed with the sys schema name.

Fortunately this solution solved our problems. We’re running a job that, from a DAC connection, reindexes the internal table that backs the queue, even though the queue is empty. This operation reclaims the space consumed by millions of ghosted records back the the database:

dbcc dbreindex('sys.queue_messages_1003150619')

Conclusion

Just like tables, queues may require maintenance operations. But unlike tables, the DBA has no DDL at its disposal to do the job, except the DBCC DBREINDEX on the internal table, run from the DAC connection, which is a hack: a deprecated command, the DAC requirement, the internal table name digg from metadata… Hopefully, the problem will be addressed eventually and ALTER QUEUE … REINDEX and ALTER QUEUE … REORGANIZE will make it to the product.

Until then, should you be reindexing your queues every night? No. The conditions I encountered were caused by a very particular balance of server IO capacity, message incoming rate and triggered by a particularly high spike. But still, its good to know: if you ever do need to rebuild a queue because of high fragmentation or because of ghost cleanup … modesty… you can: find the name of the internal table behind the queue, open a DAC connection and run DBCC DBREINDEX.

I had this post on in standby for some time now, but since the SQLTuesday#4 topic is IO, IO, It’s Off To Disk We Go! I took the opportunity to finish it in time for the roundup.

DbccCommand Enumeration

February 22nd, 2010

The SMO libraries contain an enum for the DBCC commands: DbccCommand. But there is little explanation in the MSDN documentation what those commands are. I had a recent discussion with a developer who, understandably, expected a little cleaner explanation for the meaning of the mysterious enum values. So I decided to compile a table of links to the corresponding SQL Server DBCC command documentation. For well known undocumented commands, I’ve linked popular community articles. Some undocumented DBCC commands are very dangerous to use, obsolete or some are only for internal use, or simply I have no idea what they do, and I did not add any explanation for them.

Command Documentation Comments
ActiveCursors
AddExtendedProc
AddInstance
AuditEvent
AutoPilot Used by the Tunning Advisor
Buffer Display pages from the buffer pool.
Bytes Reads the SQL Server process memory.
CacheProfile
CacheStats
CallFullText
CheckAlloc DBCC CHECKALLOC Checks the consistency of disk space allocation structures for a specified database.
CheckCatalog DBCC CHECKCATALOG Checks for catalog consistency within the specified database.
CheckConstraints DBCC CHECKCONSTRAINTS Checks the integrity of a specified constraint or all constraints on a specified table in the current database.
CheckDB DBCC CHECKDB Checks the logical and physical integrity of all the objects in the specified database.
CheckFileGroup DBCC CHECKFILEGROUP Checks the allocation and structural integrity of all tables and indexed views in the specified filegroup of the current database.
CheckIdent DBCC CHECKIDENT Checks the current identity value for the specified table and, if it is needed, changes the identity value.
CheckPrimaryFile
CheckTable DBCC CHECKTABLE Checks the integrity of all the pages and structures that make up the table or indexed view.
CleanTable DBCC CLEANTABLE Reclaims space from dropped variable-length columns in tables or indexed views.
ClearSpaceCaches
CollectStats
ConcurrencyViolation Discontinued in SQL 2008
CursorStats
DBRecover Use RESTORE DATABASE … WITH RECOVERY; instead.
DBReindex DBCC DBREINDEX Rebuilds one or more indexes for a table in the specified database.
DBReindexAll
DBRepair
DebugBreak Breaks the SQL Server process into debugger. Afaik this is inactive in the released bits.
DeleteInstance
DetachDB
DBCC dllname (FREE) Unloads the specified extended stored procedure DLL from memory.
DropCleanBuffers DBCC DROPCLEANBUFFERS Removes all clean buffers from the buffer pool.
DropExtendedProc
DumpConfig
DumpDBInfo Displays DBINFO structure for the specified database.
DumpDBTable Displays the contents of the DBTABLE structure
DumpLock
DumpLog
DumpPage
DumpResource
DumpTrigger DBCC DUMPTRIGGER Creates a process dump when an exception is raised
ErrorLog Recycles the errorlog. Use sp_cycle_errorlog instead.
ExtentInfo
FileHeader
FixAllocation
Flush
FlushProcInDB
ForceGhostCleanup Runs the ghost cleanup.
Free
FreeProcCache DBCC FREEPROCCACHE Removes all elements from the plan cache, removes a specific plan from the plan cache by specifying a plan handle or SQL handle, or removes all cache entries associated with a specified resource pool.
FreeSessionCache DBCC FREESESSIONCACHE Flushes the distributed query connection cache used by distributed queries against an instance of Microsoft SQL Server.
FreeSystemCache DBCC FREESYSTEMCACHE Releases all unused cache entries from all caches.
FreezeIO This is the ‘freeze’ phase of the SQL Writer VSS based database snapshot backup, as documented in A Guide for SQL Server Backup Application Vendors
Help DBCC HELP Returns syntax information for the specified DBCC command.
IcecapQuery Icecap is the name of one of the code profilers used inside Microsoft.
IncrementInstance
Ind Shows all pages in use by indexes of the specified table.
IndexDefrag DBCC INDEXDEFRAG Defragments indexes of the specified table or view.
InputBuffer DBCC INPUTBUFFER Displays the last statement sent from a client to an instance of Microsoft SQL Server.
InvalidateTextptr
InvalidateTextptrObjid
Latch
LogInfo
MapAllocUnit
MemObjList Use sys.dm_os_memory_objects instead.
MemoryMap
MemoryStatus DBCC MEMORYSTATUS Displays a snapshot of the current memory status of SQL Server.
Metadata
MovePage Do not use it.
NoTextptr
OpenTran DBCC OPENTRAN Displays information about the oldest active transaction and the oldest distributed and nondistributed replicated transactions, if any, within the specified database.
OptimizerWhatIf
OutputBuffer DBCC OUTPUTBUFFER Returns the current output buffer in hexadecimal and ASCII format for the specified session_id.
DBCC PAGE Prints out the contents of a SQL Server page. Although not in the SMO enum MSDN documentation, I could not let this one out of any DBCC write up…
PinTable
PerfMonStats
PersistStackHash
ProcCache DBCC PROCCACHE Displays information in a table format about the procedure cache.
PrtiPage Displays the page number pointed to by each row on the specified index page.
ReadPage
RenameColumn
RuleOff
RuleOn
SeMetadata
SetCpuWeight
SetInstance
SetIOWeight
ShowStatistics DBCC SHOW_STATISTICS Displays current query optimization statistics for a table or indexed view.
ShowContig DBCC SHOWCONTIG Displays fragmentation information for the data and indexes of the specified table or view.
ShowDBAffinity
ShowFileStats
ShowOffRules
ShowOnRules
ShowTableAffinity
ShowText
ShowWeights
ShrinkDatabase DBCC SHRINKDATABASE Shrinks the size of the data and log files in the specified database.
ShrinkFile DBCC SHRINKFILE Shrinks the size of the specified data or log file for the current database or empties a file by moving the data from the specified file to other files in the same filegroup, allowing the file to be removed from the database.
SqlMgrStats
SqlPerf DBCC SQLPERF Provides transaction log space usage statistics for all databases.
StackDump Creates a SQL Server process mini-dump
Tec
ThawIO This is the ‘thaw’ phase of the SQL Writer VSS based database snapshot backup, as documented in A Guide for SQL Server Backup Application Vendors
TraceOff DBCC TRACEOFF Disables the specified trace flags.
TraceOn DBCC TRACEON Enables the specified trace flags.
TraceStatus DBCC TRACESTATUS Displays the status of trace flags.
UnpinTable
UpdateUsage DBCC UPDATEUSAGE Reports and corrects pages and row count inaccuracies in the catalog views.
UsePlan
UserOptions DBCC USEROPTIONS Returns the SET options active (set) for the current connection.
WritePage Do not use it.

On SQL Server boolean operator short-circuit

September 13th, 2009

Recently I had several discussions all circling around the short-circuit of boolean expressions in Transact-SQL queries. Many developers that come from an imperative language background like C are relying on boolean short-circuit to occur when SQL queries are executed. Often they take this expectation to extreme and the correctness of the result is actually relying on the short-circuit to occur:

select 'Will not divide by zero!' where 1=or 1/0=0

In the SQL snippet above the expression on the right side of the OR operator would cause a division by zero if ever evaluated. Yet the query executes fine and the successful result is seen as proof that operator short-circuit does happen! Well, is that all there is? Of course not. An universal quantification cannot be demonstrated with an example. But it can be proven false with one single counter example!

Luckily I have two aces on my sleeve: for one I know how the Query Optimizer works. Second, I’ve stayed close enough to Microsoft CSS front lines for 6 months to see actual cases pouring in from developers bitten by the short-circuit assumption. Here is my counter-example case:

create table eav (
    eav_id int identity(1,1) primary key,
    attribute varchar(50) not null,
    is_numeric bit not null,
    [value] sql_variant null);
create index eav_attribute on eav(attribute) include ([value]);
go

-- Fill the table with random values
set nocount on
declare @i int;
select @i = 0;
while @i < 100000
begin
    declare @attribute varchar(50),
        @is_numeric bit,
        @value sql_variant;
    select @attribute = 'A' + cast(cast(rand()*1000 as  int) as varchar(3));
    select @is_numeric = case when rand() > 0.5 then 1 else 0 end;
    if 1=@is_numeric
        select @value = cast(rand() * 100 as int);
    else
        select @value = 'Lorem ipsum';
    insert into eav (attribute, is_numeric, [value])
    values (@attribute, @is_numeric, @value);
    select @i = @i+1;
end
go

-- insert a 'trap'
insert into eav (attribute, is_numeric, [value])
values ('B1', 0, 'Gotch ya');
GO

-- select the 'trap' value
select [value] from eav where 
    attribute = 'B1' 
    and is_numeric = 1 
    and cast([value] as int) > 50
go

Msg 245, Level 16, State 1, Line 3
Conversion failed when converting the varchar value 'Gotch ya' to data type int.

This happens on SQL Server 2005 SP2. Clearly, the conversion does occur even though the value is marked as ‘not numeric’. Whats going on here? To better understand, lets insert a known value that can be converted and then run the same query again and look at the execution plan:

insert into eav (attribute, is_numeric, [value])
values ('B2', 1, 65);
go

select [value] from eav where 
    attribute = 'B2' 
    and is_numeric = 1 
    and cast([value] as int) > 50
go


boolean short-circuit counter example query plan

boolean short-circuit counter example query plan

Looking at the plan we can see how the query is actually evaluated: seek on the non-clustered index for the attribute ‘B2′, project the ‘value’, filter for the value predicate ‘cast([value] as int)>50′ then perform a nested join to look up the ‘is_boolean’ in the clustered index! So the right side of the AND operator is evaluated first. Q.E.D.

Is this a bug? Of course not. SQL is a declarative language, the query optimizer is free to choose any execution path that provide the requested result. Boolean operator short-circuit is NOT GUARANTEED. My query has set up a trap for the query optimizer, by providing a tempting execution path using the non-clustered index. For my example to work I had to set up a large table and enough distinct values of ‘attribute’ so that the optimizer would see the non-clustered index access followed by bookmark look up as a better plan than a clustered scan. And it is, by all means a better plan. But then I placed my trap: by adding the ‘value’ as an included column in the non-clustered index, I give the optimizer a too sweet to resists opportunity to evaluate the filter predicate on the ‘value’ column before it evaluates the filter predicate on the ‘is_numeric’ column, thus forcing the break on the short-circuit assumption.

Passing Parameters to a Background Procedure

August 18th, 2009

I have posted previously an example how to invoke a procedure asynchronously using service Broker activation. Several readers have inquired how to extend this mechanism to add parameters to the background launched procedure.

Passing parameters to a single well know procedure is easy: the parameters are be added to the message body and the activated procedure looks them up in the received XML, passing them to the called procedure. But is significantly more complex to create a generic mechanism that can pass parameters to any procedure. The problem is the type system, because the parameters have unknown types and the activated procedure has to pass proper typed parameters to the invoked procedure.

A generic solution should accept a variety of parameter types and should deal with the peculiarities of Transact-SQL parameters passing, namely the named parameters capabilities. Also the invocation wrapper usp_AsyncExecInvoke should directly accept the parameters for the desired background procedure. After considering several alternatives, I settled on the following approach:

  • Rely on the generic sql_variant data type available in SQL Server. The invocation wrapper usp_AsyncExecInvoke accept all the parameters as sql_variant.
  • Pass the background procedure parameter names explicitly to the invocation wrapper usp_AsyncExecInvoke.
  • Always used named parameters in the background procedure invocation.
  • Build a dynamic SQL batch to invoke the background procedure that deals with the required parameter type conversions.

Accepting Parameters

The invocation wrapper usp_AsyncExecInvoke has changed the signature to accept a variable number of parameters:

create procedure [usp_AsyncExecInvoke]
    @procedureName sysname
    , @p1 sql_variant = NULL, @n1 sysname = NULL
    , @p2 sql_variant = NULL, @n2 sysname = NULL
    , @p3 sql_variant = NULL, @n3 sysname = NULL
    , @p4 sql_variant = NULL, @n4 sysname = NULL
    , @p5 sql_variant = NULL, @n5 sysname = NULL
    , @token uniqueidentifier output

The new parameters are used to collect the desired parameter values and names to be passed to the background procedure. So sonsider you would like to make the following traditional, synchronous, call:

exec usp_withParam @id = 1.0
	, @name = N'Foo'
	, @bytes = 0xBAADF00D;

The equivalent asynchronous invocation would need to pass the parameter values (1.0, 'Foo' and 0xbaadf00d) as @p1, @p2 and @p3 respectively and the parameter names (@id, @name and @bytes) as @n1, @n2 and @n3:

exec usp_AsyncExecInvoke @procedureName = N'usp_withParam'
 , @p1 = 1.0, @n1 = N'@id'
 , @p2 = N'Foo', @n2='@name'
 , @p3 = 0xBAADF00D, @n3 = '@bytes'
 , @token = @token output;

The invocation wrapper will pack the parameter names, values and type information into the message body. The parameter type information is deduced at run time from the actual parameters, by using the SQL_VARIANT_PROPERTY system function. This is the XML message body corresponding to the invocation above (note how the varbinary parameter was encoded using base64 encoding):

<procedure>
  <name>usp_withParam</name>
  <parameters>
    <parameter Name="@id" BaseType="numeric" Precision="2" Scale="1" MaxLength="5">1.0</parameter>
    <parameter Name="@name" BaseType="nvarchar" Precision="0" Scale="0" MaxLength="6">Foo</parameter>
    <parameter Name="@bytes" BaseType="varbinary" 
          Precision="0" Scale="0" MaxLength="4">uq3wDQ==</parameter>
  </parameters>
</procedure>

Extracting the Parameters

The activated procedure job is now significantly more complex because of the task of extracting the parameter names, types and value and constructing a proper Transact-SQL statement to invoke the original procedure. A new helper procedure was added, usp_procedureInvokeHelper, that constructs a dynamic SQL batch from a invokation message body. Here is the dynamic SQL that results from our sample invocation:

declare @pt1 numeric(2,1)
declare @pt2 nvarchar(6)
declare @pt3 varbinary(4)
select @pt1=@x.value(N'(//procedure/parameters/parameter)[1]', N'numeric(2,1)');
select @pt2=@x.value(N'(//procedure/parameters/parameter)[2]', N'nvarchar(6)');
select @pt3=@x.value(N'(//procedure/parameters/parameter)[3]', N'varbinary(4)');
exec [usp_withParam]  @id=@pt1,@name=@pt2,@bytes=@pt3

Limitations

Because the invocation wrapper uses sql_variant typed parameters, all sql_variant restrictions apply to this sample. Most notably the wrapper will not accept XML type parameters nor large data types like varchar(max) or varbinary(max). It is possible to work around these limitations but it would complicate my sample beyond the point of being usefull as an example how to pass the parameters.

Because the error handling of the activated procedure will rollback in case of error, passing in a message that results in a procedure invocation error will cause the activation to rollback repeatedly and the poison message detection mechanism will kick in, deactivating the queue. A trivial example is if we omit the required @id parameter for our sample invocation.

The code

create table [AsyncExecResults] (
 [token] uniqueidentifier primary key
 , [submit_time] datetime not null
 , [start_time] datetime null
 , [finish_time] datetime null
 , [error_number] int null
 , [error_message] nvarchar(2048) null);
go

create queue [AsyncExecQueue];
go

create service [AsyncExecService] on queue [AsyncExecQueue] ([DEFAULT]);
GO

-- Dynamic SQL helper procedure
-- Extracts the parameters from the message body
-- Creates the invocation Transact-SQL batch
-- Invokes the dynmic SQL batch
create procedure [usp_procedureInvokeHelper] (@x xml)
as
begin
    set nocount on;

    declare @stmt nvarchar(max)
        , @stmtDeclarations nvarchar(max)
        , @stmtValues nvarchar(max)
        , @i int
        , @countParams int
        , @namedParams nvarchar(max)
        , @paramName sysname
        , @paramType sysname
        , @paramPrecision int
        , @paramScale int
        , @paramLength int
        , @paramTypeFull nvarchar(300)
        , @comma nchar(1)

    select @i = 0
        , @stmtDeclarations = N''
        , @stmtValues = N''
        , @namedParams = N''
        , @comma = N''

    declare crsParam cursor forward_only static read_only for
        select x.value(N'@Name', N'sysname')
            , x.value(N'@BaseType', N'sysname')
            , x.value(N'@Precision', N'int')
            , x.value(N'@Scale', N'int')
            , x.value(N'@MaxLength', N'int')
        from @x.nodes(N'//procedure/parameters/parameter') t(x);
    open crsParam;

    fetch next from crsParam into @paramName
        , @paramType
        , @paramPrecision
        , @paramScale
        , @paramLength;
    while (@@fetch_status = 0)
    begin
        select @i = @i + 1;

        select @paramTypeFull = @paramType +
            case
            when @paramType in (N'varchar'
                , N'nvarchar'
                , N'varbinary'
                , N'char'
                , N'nchar'
                , N'binary') then
                N'(' + cast(@paramLength as nvarchar(5)) + N')'
            when @paramType in (N'numeric') then
                N'(' + cast(@paramPrecision as nvarchar(10)) + N',' +
                cast(@paramScale as nvarchar(10))+ N')'
            else N''
            end;

        -- Some basic sanity check on the input XML
        if (@paramName is NULL
            or @paramType is NULL
            or @paramTypeFull is NULL
            or charindex(N'''', @paramName) > 0
            or charindex(N'''', @paramTypeFull) > 0)
            raiserror(N'Incorrect parameter attributes %i: %s:%s %i:%i:%i'
                , 16, 10, @i, @paramName, @paramType
                , @paramPrecision, @paramScale, @paramLength);

        select @stmtDeclarations = @stmtDeclarations + N'
declare @pt' + cast(@i as varchar(3)) + N' ' + @paramTypeFull
            , @stmtValues = @stmtValues + N'
select @pt' + cast(@i as varchar(3)) + N'=@x.value(
    N''(//procedure/parameters/parameter)[' + cast(@i as varchar(3)) 
                + N']'', N''' + @paramTypeFull + ''');'
            , @namedParams = @namedParams + @comma + @paramName
                + N'=@pt' + cast(@i as varchar(3));

        select @comma = N',';

        fetch next from crsParam into @paramName
            , @paramType
            , @paramPrecision
            , @paramScale
            , @paramLength;
    end

    close crsParam;
    deallocate crsParam;        

    select @stmt = @stmtDeclarations + @stmtValues + N'
exec ' + quotename(@x.value(N'(//procedure/name)[1]', N'sysname'));

    if (@namedParams != N'')
        select @stmt = @stmt + N' ' + @namedParams;

    exec sp_executesql @stmt, N'@x xml', @x;
end
go

create procedure usp_AsyncExecActivated
as
begin
    set nocount on;
    declare @h uniqueidentifier
        , @messageTypeName sysname
        , @messageBody varbinary(max)
        , @xmlBody xml
        , @startTime datetime
        , @finishTime datetime
        , @execErrorNumber int
        , @execErrorMessage nvarchar(2048)
        , @xactState smallint
        , @token uniqueidentifier;

    begin transaction;
    begin try;
        receive top(1)
            @h = [conversation_handle]
            , @messageTypeName = [message_type_name]
            , @messageBody = [message_body]
            from [AsyncExecQueue];
        if (@h is not null)
        begin
            if (@messageTypeName = N'DEFAULT')
            begin
                -- The DEFAULT message type is a procedure invocation.
                --
                select @xmlBody = CAST(@messageBody as xml);

                save transaction usp_AsyncExec_procedure;
                select @startTime = GETUTCDATE();
                begin try
                    exec [usp_procedureInvokeHelper] @xmlBody;
                end try
                begin catch
                -- This catch block tries to deal with failures of the procedure execution
                -- If possible it rolls back to the savepoint created earlier, allowing
                -- the activated procedure to continue. If the executed procedure
                -- raises an error with severity 16 or higher, it will doom the transaction
                -- and thus rollback the RECEIVE. Such case will be a poison message,
                -- resulting in the queue disabling.
                --
                select @execErrorNumber = ERROR_NUMBER(),
                    @execErrorMessage = ERROR_MESSAGE(),
                    @xactState = XACT_STATE();
                if (@xactState = -1)
                begin
                    rollback;
                    raiserror(N'Unrecoverable error in procedure: %i: %s', 16, 10,
                        @execErrorNumber, @execErrorMessage);
                end
                else if (@xactState = 1)
                begin
                    rollback transaction usp_AsyncExec_procedure;
                end
                end catch

                select @finishTime = GETUTCDATE();
                select @token = [conversation_id]
                    from sys.conversation_endpoints
                    where [conversation_handle] = @h;
                if (@token is null)
                begin
                    raiserror(N'Internal consistency error: conversation not found', 16, 20);
                end
                update [AsyncExecResults] set
                    [start_time] = @starttime
                    , [finish_time] = @finishTime
                    , [error_number] = @execErrorNumber
                    , [error_message] = @execErrorMessage
                    where [token] = @token;
                if (0 = @@ROWCOUNT)
                begin
                    raiserror(N'Internal consistency error: token not found', 16, 30);
                end
                end conversation @h;
            end
            else if (@messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog')
            begin
                end conversation @h;
            end
            else if (@messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error')
            begin
                declare @errorNumber int
                    , @errorMessage nvarchar(4000);
                select @xmlBody = CAST(@messageBody as xml);
                with xmlnamespaces (DEFAULT N'http://schemas.microsoft.com/SQL/ServiceBroker/Error')
                select @errorNumber = @xmlBody.value ('(/Error/Code)[1]', 'INT'),
                    @errorMessage = @xmlBody.value ('(/Error/Description)[1]', 'NVARCHAR(4000)');
                -- Update the request with the received error
                select @token = [conversation_id] 
                    from sys.conversation_endpoints 
                    where [conversation_handle] = @h;
                update [AsyncExecResults] set
                    [error_number] = @errorNumber
                    , [error_message] = @errorMessage
                    where [token] = @token;
                end conversation @h;
             end
           else
           begin
                raiserror(N'Received unexpected message type: %s', 16, 50, @messageTypeName);
           end
        end
        commit;
    end try
    begin catch
        declare @error int
         , @message nvarchar(2048);
        select @error = ERROR_NUMBER()
            , @message = ERROR_MESSAGE()
            , @xactState = XACT_STATE();
        if (@xactState <> 0)
        begin
         rollback;
        end;
        raiserror(N'Error: %i, %s', 1, 60,  @error, @message) with log;
    end catch
end
go

alter queue [AsyncExecQueue]
    with activation (
    procedure_name = [usp_AsyncExecActivated]
    , max_queue_readers = 1
    , execute as owner
    , status = on);
go

-- Helper function to create the XML element 
-- for a passed in parameter
create function [dbo].[fn_DescribeSqlVariant] (
 @p sql_variant
 , @n sysname)
returns xml
with schemabinding
as
begin
 return (
 select @n as [@Name]
  , sql_variant_property(@p, 'BaseType') as [@BaseType]
  , sql_variant_property(@p, 'Precision') as [@Precision]
  , sql_variant_property(@p, 'Scale') as [@Scale]
  , sql_variant_property(@p, 'MaxLength') as [@MaxLength]
  , @p
  for xml path('parameter'), type)
end
GO

-- Invocation wrapper. Accepts arbitrary
-- named parameetrs to be passed to the
-- background procedure
create procedure [usp_AsyncExecInvoke]
    @procedureName sysname
    , @p1 sql_variant = NULL, @n1 sysname = NULL
    , @p2 sql_variant = NULL, @n2 sysname = NULL
    , @p3 sql_variant = NULL, @n3 sysname = NULL
    , @p4 sql_variant = NULL, @n4 sysname = NULL
    , @p5 sql_variant = NULL, @n5 sysname = NULL
    , @token uniqueidentifier output
as
begin
    declare @h uniqueidentifier
     , @xmlBody xml
        , @trancount int;
    set nocount on;

 set @trancount = @@trancount;
    if @trancount = 0
        begin transaction
    else
        save transaction usp_AsyncExecInvoke;
    begin try
        begin dialog conversation @h
            from service [AsyncExecService]
            to service N'AsyncExecService', 'current database'
            with encryption = off;
        select @token = [conversation_id]
            from sys.conversation_endpoints
            where [conversation_handle] = @h;

        select @xmlBody = (
            select @procedureName as [name]
            , (select * from (
                select [dbo].[fn_DescribeSqlVariant] (@p1, @n1) AS [*] 
                    WHERE @p1 IS NOT NULL
                union all select [dbo].[fn_DescribeSqlVariant] (@p2, @n2) AS [*] 
                    WHERE @p2 IS NOT NULL
                union all select [dbo].[fn_DescribeSqlVariant] (@p3, @n3) AS [*] 
                    WHERE @p3 IS NOT NULL
                union all select [dbo].[fn_DescribeSqlVariant] (@p4, @n4) AS [*] 
                    WHERE @p4 IS NOT NULL
                union all select [dbo].[fn_DescribeSqlVariant] (@p5, @n5) AS [*] 
                    WHERE @p5 IS NOT NULL
                ) as p for xml path(''), type
            ) as [parameters]
            for xml path('procedure'), type);
        send on conversation @h (@xmlBody);
        insert into [AsyncExecResults]
            ([token], [submit_time])
            values
            (@token, getutcdate());
    if @trancount = 0
        commit;
    end try
    begin catch
        declare @error int
            , @message nvarchar(2048)
            , @xactState smallint;
        select @error = ERROR_NUMBER()
            , @message = ERROR_MESSAGE()
            , @xactState = XACT_STATE();
        if @xactState = -1
            rollback;
        if @xactState = 1 and @trancount = 0
            rollback
        if @xactState = 1 and @trancount > 0
            rollback transaction usp_my_procedure_name;

        raiserror(N'Error: %i, %s', 16, 1, @error, @message);
    end catch
end
go

-- Sample invocation example
-- The usp_withParam will insert 
-- all the received parameters into this table
-- 
create table [withParam] (
    id numeric(4,1) NULL
    , name varchar(150) NULL
    , date datetime  NULL
    , value int NULL
    , bytes varbinary(max) NULL);
go

create procedure usp_withParam
 @id numeric(4,1)
 , @name varchar(150)
 , @date datetime = NULL
 , @value int = 0
 , @bytes varbinary(max) = NULL
as
begin
    insert into [withParam] (
        id
        , name
        , date
        , value
        , bytes)
     select @id as [id]
      , @name as [name]
      , @date as [date]
      , @value as [value]
      , @bytes as [bytes]
end
go

declare @token uniqueidentifier;

exec usp_AsyncExecInvoke @procedureName = N'usp_withParam'
 , @p1 = 1.0, @n1 = N'@id'
 , @p2 = N'Foo', @n2='@name'
 , @p3 = 0xBAADF00D, @n3 = '@bytes'
 , @token = @token output;

waitfor delay '00:00:05';

select * from AsyncExecResults;
select * from withParam;

go

Asynchronous procedure execution

August 5th, 2009

Update: a version of this sample that accepts parameters is available in the post Passing Parameters to a Background Procedure

Recently an user on StackOverflow raised the question Execute a stored procedure from a windows form asynchronously and then disconnect?. This is a known problem, how to invoke a long running procedure on SQL Server without constraining the client to wait for the procedure execution to terminate. Most times I’ve seen this question raised in the context of web applications when waiting for a result means delaying the response to the client browser. On Web apps the time constraint is even more drastic, the developer often desires to launch the procedure and immediately return the page even when the execution lasts only few seconds. The application will retrieve the execution result later, usually via an Ajax call driven by the returned page script.

Frankly I was a bit surprised to see that the responses gravitated either around the SqlClient asynchronous methods (BeginExecute…) or around having a dedicated process with the sole pupose of maintaining the client connection alive for the duration of the long running procedure.

This problem is perfectly addressed by Service Broker Activation. Since I wanted to preserve the solution for further reference, I decided to put it in as a blog entry, with additional comments. For many of you Service Broker aficionados that read my blog regularly, this article is not innovative as is mostly a rehash of well known techniques I’ve been talking about on forums for many years now.

I’m going to use a table to store the result of the procedure execution. In this version I’ll keep things simple by not allowing for any parameters to be passed to the procedure, nor collecting any execution result set data. So the table will only contain the procedure start time, the execution finish time and any error that occurred during the procedure execution:

create table [AsyncExecResults] (
	[token] uniqueidentifier primary key
	, [submit_time] datetime not null
	, [start_time] datetime null
	, [finish_time] datetime null
	, [error_number]	int null
	, [error_message] nvarchar(2048) null);
go

Next we’re going to create the service and queue we need. I will use one single service for both roles (initiator and target) and I won’t create an explicit contract, relying instead on the predefined DEFAULT contract:

create queue [AsyncExecQueue];
go

create service [AsyncExecService] on queue [AsyncExecQueue] ([DEFAULT]);
GO

Next is the core of our asynchronous execution: the activated procedure. The procedure has to dequeue the message that specifies the user procedure, run the procedure and write the result in the results table. I will also deploy the error handling template I elaborated on my previous article Exception handling and nested transactions:

create procedure usp_AsyncExecActivated
as
begin
    set nocount on;
    declare @h uniqueidentifier
        , @messageTypeName sysname
        , @messageBody varbinary(max)
        , @xmlBody xml
        , @procedureName sysname
        , @startTime datetime
        , @finishTime datetime
        , @execErrorNumber int
        , @execErrorMessage nvarchar(2048)
        , @xactState smallint
        , @token uniqueidentifier;

    begin transaction;
    begin try;
        receive top(1) 
            @h = [conversation_handle]
            , @messageTypeName = [message_type_name]
            , @messageBody = [message_body]
            from [AsyncExecQueue];
        if (@h is not null)
        begin
            if (@messageTypeName = N'DEFAULT')
            begin
                -- The DEFAULT message type is a procedure invocation.
                -- Extract the name of the procedure from the message body.
                --
                select @xmlBody = CAST(@messageBody as xml);
                select @procedureName = @xmlBody.value(
                    '(//procedure/name)[1]'
                    , 'sysname');

                save transaction usp_AsyncExec_procedure;
                select @startTime = GETUTCDATE();
                begin try
                    exec @procedureName;
                end try
                begin catch
                -- This catch block tries to deal with failures of the procedure execution
                -- If possible it rolls back to the savepoint created earlier, allowing
                -- the activated procedure to continue. If the executed procedure 
                -- raises an error with severity 16 or higher, it will doom the transaction
                -- and thus rollback the RECEIVE. Such case will be a poison message,
                -- resulting in the queue disabling.
                --
                select @execErrorNumber = ERROR_NUMBER(),
                    @execErrorMessage = ERROR_MESSAGE(),
                    @xactState = XACT_STATE();
                if (@xactState = -1)
                begin
                    rollback;
                    raiserror(N'Unrecoverable error in procedure %s: %i: %s', 16, 10,
                        @procedureName, @execErrorNumber, @execErrorMessage);
                end
                else if (@xactState = 1)
                begin
                    rollback transaction usp_AsyncExec_procedure;
                end
                end catch

                select @finishTime = GETUTCDATE();
                select @token = [conversation_id] 
                    from sys.conversation_endpoints 
                    where [conversation_handle] = @h;
                if (@token is null)
                begin
                    raiserror(N'Internal consistency error: conversation not found', 16, 20);
                end
                update [AsyncExecResults] set
                    [start_time] = @starttime
                    , [finish_time] = @finishTime
                    , [error_number] = @execErrorNumber
                    , [error_message] = @execErrorMessage
                    where [token] = @token;
                if (0 = @@ROWCOUNT)
                begin
                    raiserror(N'Internal consistency error: token not found', 16, 30);
                end
                end conversation @h;
            end 
            else if (@messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog')
            begin
                end conversation @h;
            end
            else if (@messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error')
            begin
                declare @errorNumber int
                    , @errorMessage nvarchar(4000);
                select @xmlBody = CAST(@messageBody as xml);
                with xmlnamespaces (DEFAULT N'http://schemas.microsoft.com/SQL/ServiceBroker/Error')
                select @errorNumber = @xmlBody.value ('(/Error/Code)[1]', 'INT'),
                    @errorMessage = @xmlBody.value ('(/Error/Description)[1]', 'NVARCHAR(4000)');
                -- Update the request with the received error
                select @token = [conversation_id] 
                    from sys.conversation_endpoints 
                    where [conversation_handle] = @h;
                update [AsyncExecResults] set
                    [error_number] = @errorNumber
                    , [error_message] = @errorMessage
                    where [token] = @token;
                end conversation @h;
             end
           else
           begin
                raiserror(N'Received unexpected message type: %s', 16, 50, @messageTypeName);
           end
        end
        commit;
    end try
    begin catch
        declare @error int
            , @message nvarchar(2048);
        select @error = ERROR_NUMBER()
            , @message = ERROR_MESSAGE()
            , @xactState = XACT_STATE();
        if (@xactState <> 0)
        begin
            rollback;
        end;
        raiserror(N'Error: %i, %s', 1, 60,  @error, @message) with log;
    end catch
end
go

To make the procedure activated we need to attach it to our service queue. This will ensure this procedure is run whenever a message arrives to our [AsyncExecService]:

alter queue [AsyncExecQueue]
    with activation (
    procedure_name = [usp_AsyncExecActivated]
    , max_queue_readers = 1
    , execute as owner
    , status = on);
go

And finaly the last piece of the puzzle: the procedure that submits the message to invoke the desired asyncronous executed procedure. This procedure resturns an output parameter ‘token’ than can be used to lookup the asynchronous execution result.

create procedure [usp_AsyncExecInvoke]
    @procedureName sysname
    , @token uniqueidentifier output
as
begin
    declare @h uniqueidentifier
	    , @xmlBody xml
        , @trancount int;
    set nocount on;

	set @trancount = @@trancount;
    if @trancount = 0
        begin transaction
    else
        save transaction usp_AsyncExecInvoke;
    begin try
        begin dialog conversation @h
            from service [AsyncExecService]
            to service N'AsyncExecService', 'current database'
            with encryption = off;
        select @token = [conversation_id]
            from sys.conversation_endpoints
            where [conversation_handle] = @h;
        select @xmlBody = (
            select @procedureName as [name]
            for xml path('procedure'), type);
        send on conversation @h (@xmlBody);
        insert into [AsyncExecResults]
            ([token], [submit_time])
            values
            (@token, getutcdate());
    if @trancount = 0
        commit;
    end try
    begin catch
        declare @error int
            , @message nvarchar(2048)
            , @xactState smallint;
        select @error = ERROR_NUMBER()
            , @message = ERROR_MESSAGE()
            , @xactState = XACT_STATE();
        if @xactState = -1
            rollback;
        if @xactState = 1 and @trancount = 0
            rollback
        if @xactState = 1 and @trancount > 0
            rollback transaction usp_my_procedure_name;

        raiserror(N'Error: %i, %s', 16, 1, @error, @message);
    end catch
end
go

To test our asynchronous execution infrastructure we create a test procedure and invoke it asynchronously. I will create two test procedures, one that simply waits for 5 seconds to simulate a ‘long’ running procedure and one that produces intentionally a primary key violation, to simulate a fault in the asynchronously executed procedure:

create procedure [usp_MyLongRunningProcedure]
as
begin
    waitfor delay '00:00:05';
end
go

create procedure [usp_MyFaultyProcedure]
as
begin
    set nocount on;
    declare @t table (id int primary key);
    insert into @t (id) values (1);
    insert into @t (id) values (1);
end
go


declare @token uniqueidentifier;
exec usp_AsyncExecInvoke N'usp_MyLongRunningProcedure', @token output;
select * from [AsyncExecResults] where [token] = @token;
go

declare @token uniqueidentifier;
exec usp_AsyncExecInvoke N'usp_MyFaultyProcedure', @token output;
select * from [AsyncExecResults] where [token] = @token;
go

waitfor delay '00:00:10';
select * from [AsyncExecResults];
go

Activation Context

If you check the start time of the second asynchronosuly executed procedure you will notice that it started right after the first one finished. This is because we declare a max_queue_readers value of 1 when we set up activation on the queue. This restricts that at most one activated procedure to run at any time, effectively serializing all the asynchronously executed procedures. Whether this is desired or not depends a lot on the actual usage scenario. The limit can be increased as necessary.

If you start playing around with this method of invoking procedures asynchronously you will notice that sometimes the asynchrnously executed procedure is misteriously denied access to other databases or to server scoped objects. When the same procedure is run manually from a query window in SSMS, it executes fine. This is caused by the EXECUTE AS context under which activation occurs. the details are explained in MSDN’s Extending Database Impersonation by Using EXECUTE AS and myself I had covered this subject repeatedly in this blog. The best solution is to simply turn the trustworthy bit on on the database where the activated procedure runs. When this is not desired, or not allowed by your hosting environment, the solution is to code sign the activated procedure: Signing an activated procedure.

Using Service Broker Activation to invoke procedures asynchronously may look daunting at beginning. It sure is significantly more complex than just calling BeginExecuteNonQuery. But what needs to be understood is that this is a reliable way to invoke the procedure. The client is free to disconnect as soon as it commited the call to usp_AsyncExecInvoke. The procedure invoked will run, even if the server is stopped and restarted, even if a mirroring or clustering failover occurs. The server may even crash and be completely rebuilt. As soon as the database is back online, that queue will activate and invoke the asynchronous execution. Such level of reliability is difficult, if not impossible, to guarantee by using a client process.

Exception handling and nested transactions

June 11th, 2009

I wanted to use a template for writing procedures that behave as intuitively as possible in regard to nested transactions. My goals were:

  • The procedure template should wrap all the work done in the procedure in a transaction.
  • The procedures should be able to call each other and the calee should nest its transaction inside the outer caller transaction.
  • The procedure should only rollback its own work in case of exception, if possible.
  • The caller should be able to resume and continue even if the calee rolled back its work.

My solution is to use a either a transactions or a savepoint, depending on the value of @@TRANCOUNT at procedure start. The procedures start a new transaction if no transaction is pending. Otherwise they simply create a savepoint. On exit the procedure commits the transaction they started (if they started one), otherwise they simply exit. On exception, if the transaction is not doomed, the procedure either rolls back (if it started the transaction), or rolls back to the savepoint it created (if calee already provided a transaction).

Because of the use of savepoints this template does not work in all situations, since there are cases like distributed transactions, that cannot be mixed with savepoints. But for the average run of the mill procedures, this template has saved me very well.

create procedure [usp_my_procedure_name]
as
begin
	set nocount on;
	declare @trancount int;
	set @trancount = @@trancount;
	begin try
		if @trancount = 0
			begin transaction
		else
			save transaction usp_my_procedure_name;

		-- Do the actual work here
	
lbexit:
		if @trancount = 0	
			commit;
	end try
	begin catch
		declare @error int, @message varchar(4000), @xstate int;
		select @error = ERROR_NUMBER(), @message = ERROR_MESSAGE(), @xstate = XACT_STATE();
		if @xstate = -1
			rollback;
		if @xstate = 1 and @trancount = 0
			rollback
		if @xstate = 1 and @trancount > 0
			rollback transaction usp_my_procedure_name;

		raiserror ('usp_my_procedure_name: %d: %s', 16, 1, @error, @message) ;
	end catch	
end