select count(*);

October 26th, 2009

Quick trivia: what is the result of running SELECT COUNT(*);?

That’s right, no FROM clause, just COUNT(*). The answer may be a little bit surprising, is 1. When you query SELECT 1; the result is, as expected, 1. And SELECT 2; will return 2. So SELECT COUNT(2); returns, as expected, 1, after all it counts how many rows are in the result set. But SELECT COUNT(*); has a certain smell of voo-doo to it. Ok, is the * project operator, but project from… what exactly? It feels eerie, like a count is materialized out of the blue.

How about SELECT COUNT(*) [MyTable]. Well, that’s actually just a shortcut for SELECT COUNT(*) AS [MyTable], so it still returns 1 but in a column named MyTable. Now you understand why my heart missed a bit when I checked how I initialized a replication subscription and I forgot to type in FROM

Asynchronous T-SQL at SQL Saturday #26

September 28th, 2009

The Seattle area PASS group is organizing the SQL Saturday #26 in Redmond on October 2nd. There are many sessions to fill 3 tracks for a full day and all of them look quite interesting. The full schedule is available at http://www.sqlsaturday.com/schedule.aspx. The event is free and you get to hear presentations by such popular SQL persona as Kalen Delaney!

On the 10:15 slot yours truly will be talking about Asynchronous T-SQL processing. See you this Saturday at the new Commons MS campus in Redmond.

On SQL Server boolean operator short-circuit

September 13th, 2009

Recently I had several discussions all circling around the short-circuit of boolean expressions in Transact-SQL queries. Many developers that come from an imperative language background like C are relying on boolean short-circuit to occur when SQL queries are executed. Often they take this expectation to extreme and the correctness of the result is actually relying on the short-circuit to occur:

select 'Will not divide by zero!' where 1=or 1/0=0

In the SQL snippet above the expression on the right side of the OR operator would cause a division by zero if ever evaluated. Yet the query executes fine and the successful result is seen as proof that operator short-circuit does happen! Well, is that all there is? Of course not. An universal quantification cannot be demonstrated with an example. But it can be proven false with one single counter example!

Luckily I have two aces on my sleeve: for one I know how the Query Optimizer works. Second, I’ve stayed close enough to Microsoft CSS front lines for 6 months to see actual cases pouring in from developers bitten by the short-circuit assumption. Here is my counter-example case:


create table eav (     
	eav_id int identity(1,1) primary key,     
	attribute varchar(50) not null,     
	is_numeric bit not null,     
	[value] sql_variant null); 
	
create index eav_attribute on eav(attribute) include ([value]); 
go 

-- Fill the table with random values 
set nocount on 
declare @i int; 
select @i = 0; 
while @i < 100000 
begin
    declare @attribute varchar(50),
	          @is_numeric bit,
			  @value sql_variant;     
	select @attribute = 'A' + cast(cast(rand()*1000 as  int) as varchar(3));     
	select @is_numeric = case when rand() > 0.5 then 1 else 0 end;     
	if 1=@is_numeric         
		select @value = cast(rand() * 100 as int);     
	else         
		select @value = 'Lorem ipsum';     
	insert into eav (attribute, is_numeric, [value])     
		values (@attribute, @is_numeric, @value);     
	select @i = @i+1; 
end 
go 

-- insert a 'trap' 
insert into eav (attribute, is_numeric, [value]) values ('B1', 0, 'Gotch ya'); 
go
 
-- select the 'trap' value 
select [value] from eav 
	where      attribute = 'B1'      
	and is_numeric = 1      
	and cast([value] as int) > 50 
go 

Msg 245, Level 16, State 1, Line 3
Conversion failed when converting the varchar value 'Gotch ya' to data type int.

This happens on SQL Server 2005 SP2. Clearly, the conversion does occur even though the value is marked as ‘not numeric’. Whats going on here? To better understand, lets insert a known value that can be converted and then run the same query again and look at the execution plan:


insert into eav (attribute, is_numeric, [value]) values ('B2', 1, 65); 
go 

select [value] from eav 
	where      attribute = 'B2'      
	and is_numeric = 1      
	and cast([value] as int) > 50;
go  

boolean short-circuit counter example query plan

boolean short-circuit counter example query plan

Looking at the plan we can see how the query is actually evaluated: seek on the non-clustered index for the attribute ‘B2’, project the ‘value’, filter for the value predicate ‘cast([value] as int)>50’ then perform a nested join to look up the ‘is_boolean’ in the clustered index! So the right side of the AND operator is evaluated first. Q.E.D.

Is this a bug? Of course not. SQL is a declarative language, the query optimizer is free to choose any execution path that provide the requested result. Boolean operator short-circuit is NOT GUARANTEED. My query has set up a trap for the query optimizer, by providing a tempting execution path using the non-clustered index. For my example to work I had to set up a large table and enough distinct values of ‘attribute’ so that the optimizer would see the non-clustered index access followed by bookmark look up as a better plan than a clustered scan. And it is, by all means a better plan. But then I placed my trap: by adding the ‘value’ as an included column in the non-clustered index, I give the optimizer a too sweet to resists opportunity to evaluate the filter predicate on the ‘value’ column before it evaluates the filter predicate on the ‘is_numeric’ column, thus forcing the break on the short-circuit assumption.

Passing Parameters to a Background Procedure

August 18th, 2009

Code on GitHub: rusanu/async_tsql

I have posted previously an example how to invoke a procedure asynchronously using service Broker activation. Several readers have inquired how to extend this mechanism to add parameters to the background launched procedure.

Passing parameters to a single well know procedure is easy: the parameters are be added to the message body and the activated procedure looks them up in the received XML, passing them to the called procedure. But is significantly more complex to create a generic mechanism that can pass parameters to any procedure. The problem is the type system, because the parameters have unknown types and the activated procedure has to pass proper typed parameters to the invoked procedure.

A generic solution should accept a variety of parameter types and should deal with the peculiarities of Transact-SQL parameters passing, namely the named parameters capabilities. Also the invocation wrapper usp_AsyncExecInvoke should directly accept the parameters for the desired background procedure. After considering several alternatives, I settled on the following approach:

Read the rest of this entry »

Asynchronous procedure execution

August 5th, 2009

Code on GitHub: rusanu/async_tsql

Update: a version of this sample that accepts parameters is available in the post Passing Parameters to a Background Procedure

Recently an user on StackOverflow raised the question Execute a stored procedure from a windows form asynchronously and then disconnect?. This is a known problem, how to invoke a long running procedure on SQL Server without constraining the client to wait for the procedure execution to terminate. Most times I’ve seen this question raised in the context of web applications when waiting for a result means delaying the response to the client browser. On Web apps the time constraint is even more drastic, the developer often desires to launch the procedure and immediately return the page even when the execution lasts only few seconds. The application will retrieve the execution result later, usually via an Ajax call driven by the returned page script.

Read the rest of this entry »

MySpace Uses SQL Server Service Broker to Protect Integrity of 1 Petabyte of Data

July 26th, 2009

I just found that Microsoft has published a use case about the way MySpace is using Service Broker on their service as the core message delivery system for the Service Dispatcher. We’re talking here 440 SQL Server instances and over 1000 databases. Quote from the use case:

Service Broker has enabled MySpace to perform foreign key management across its 440 database servers, activating and deactivating accounts for its millions of users, with one-touch asynchronous efficiency. MySpace also uses Service Broker administratively to distribute new stored procedures and other updates across all 440 database servers through the Service Dispatcher infrastructure.

That is pretty impressive. I knew about the MySpace SSB adoption since the days when I was with the Service Broker team. You probably all know my mantra I repeat all the time “don’t use fire and forget, is a bad message exchange pattern and there are scenarios when the database may be taken offline”? Guess how I found out those ‘scenarios’… Anyway, I’m really glad that they also made public some performance numbers. Until now I could only quote the 5000 message per second I can push in my own test test environment. Well, looks like MySpace has some beefier hardware:

Stelzmuller: “When we went to the lab we brought our own workloads to ensure the quality of the testing. We needed to see if Service Broker could handle loads of 4,000 messages per second. Our testing found it could handle more than 18,000 messages a second.”

Fix slow application startup due to code sign validation

July 24th, 2009

Sometimes you are faced with applications that seem to take ages to start up. Usually they freeze for about 30-40 seconds and then all of the sudden they come to live. This happens for both native and managed application and it sometimes manifest as an IIS/ASP/ASP.Net AppPool starting up slow on the first request. The very first thing I always suspect is code signing verification. When a signed module is checked the certificate verification engine may consider that the Certificate Revocation List (CRL) it posses is obsolete and attempt to download a new one. For this it connects to the internet. The problem occurs when the connectivity is either slow, or blocked for some reason. By default the verification engine will time out after 15 seconds and resume with the old, obsolete, CRL it has. The timeout can occur several times, adding up to start up times of even minutes. This occurs completely outside of the control of the application being started, its modules are not even properly wired up in memory so there is no question of application code yet running.

The information on this subject is scarce to say the least. Luckily there is an TechNet article that describes not only the process occuring, but also the controlling parameters: Certificate Revocation and Status Checking. To fix the problem on computers with poor internet conectivity, registry settings have to be modified in the HKLM\SOFTWARE\Microsoft\Cryptography\OID\EncodingType 0\CertDllCreateCertificateChainEngine\Config key:

ChainUrlRetrievalTimeoutMilliseconds
This is each individual CRL check call timeout. If is 0 or not present the default value of 15 seconds is used. Change this timeout to a reasonable value like 200 milliseconds.
ChainRevAccumulativeUrlRetrievalTimeoutMilliseconds
This is the aggregate CRL retrieval timeout. If set to 0 or not present the default value of 20 seconds is used. Change this timeout to a value like 500 milliseconds.

With these two changes the code signing verification engine will timeout the CRL refresh operation in 500 milliseconds. If the connectivity to the certificate authority site is bad, this will dramatically increase the application start up times for code signed applications.

Inspiration is perishable

July 8th, 2009

I am following the 37signals blog ever since I kinda randomly stumbled upon their Getting Real book. If you never heard about them, definitely check out the book, is a very common sense approach to managing product development in the age of Internets. On this post http://www.37signals.com/svn/posts/1798-jasons-talk-at-big-omaha-2009 I was really touched by one remark: Inspiration is perishable. The ideas you have can linger in your head for a long time, but the inspiration for it fades quickly. So don’t postpone it, by the time you get to it you’ll only deliver a pale image of the original idea. Do it when you’re pumped up and thrilled by it.

I reckon I’m a procrastinator deLuxe, but I have to agree. I know the difference between working at 2 am. and not feeling a bit tired when I’m excited about my work on one hand, and the damp feeling of exhaustion that drags you to watch some stupid TV show at 6 pm because I’m bored with the current project on the other hand.

Exception handling and nested transactions

June 11th, 2009

I wanted to use a template for writing procedures that behave as intuitively as possible in regard to nested transactions. My goals were:

  • The procedure template should wrap all the work done in the procedure in a transaction.
  • The procedures should be able to call each other and the calee should nest its transaction inside the outer caller transaction.
  • The procedure should only rollback its own work in case of exception, if possible.
  • The caller should be able to resume and continue even if the calee rolled back its work.

My solution is to use a either a transactions or a savepoint, depending on the value of @@TRANCOUNT at procedure start. The procedures start a new transaction if no transaction is pending. Otherwise they simply create a savepoint. On exit the procedure commits the transaction they started (if they started one), otherwise they simply exit. On exception, if the transaction is not doomed, the procedure either rolls back (if it started the transaction), or rolls back to the savepoint it created (if calee already provided a transaction).

Read the rest of this entry »

%%lockres%% collision probability magic marker: 16,777,215

May 29th, 2009

@jrowlandjones blogged about a dubious deadlock case. I recommend this article as is correct and presents a somewhat esoteric case of deadlock: the resource hash collision. The lock manager in SQL Server doesn’t know what it locks, it just locks ‘resources’ (basically strings). It is the job of higher level components like the the access methods of the storage engine to present the ‘resource’ to the lock manager and ask for the desired lock. When locking rows in a heap or a b-tree the storage engine will synthesize a ‘resource’ from the record identifier. Since these resources have a limited length, the storage engine has to reduce the effective length of a key to the maximum length is allowed to present to the lock manager, and that means that the record’s key will be reduced to 6 bytes. This is achieved by hashing the key into a 6 byte hash value. Nothing spectacular here.

But if you have a table with a key of length 50 bytes and its reduced to 6 bytes, you may hit a collision. So how likely is this to happen?

On 6 bytes there are 281,474,976,710,656 distinct possible values. Its a pretty big number? Not that big actually. If we meet at a party and I say ‘I bet somebody in the room shares your birthday’ would you bet against me? You probably should 🙂 What if I change my question to ‘I bet there are two people in this room that share the birthday!’? Now I will probably take your money. This is called a ‘meet-in-the-middle‘ attack in cryptography and basically it says that you get a 50% collision probability at half the hash length. So the SQL %%lockres%% hash will produce two records with same hash, with a 50% probability, out of the table, any table, of only 16,777,215 record. That suddenly doesn’t look like a cosmic constant, does it? And keep in mind that this is the absolutely maximum value, where the key has a perfectly random distribution. In reality keys are not that random after all. Take Jame’s example: a datetime (8 bytes), a country code (1 byte), a group code (2 bytes) and a random id (4 bytes). From these 15 bytes quite a few are actually constant: eg. every date between 2000 and 2010 has the first 4 bytes identical (0x00) and the 5th byte only has two possible values (0x08 or 0x09). If from the other codes (country, group) we use only 50% of the possible values, then in effect we use, generously, just 10 bytes of the 15 bytes of the key. This means the table has a 50% collision probability at only about 11 million records. considering that he was doing a ‘paltry’ 900 million records upload, no wonder he got collisions…