%%lockres%% collision probability magic marker: 16,777,215

May 29th, 2009

@jrowlandjones blogged about a dubious deadlock case. I recommend this article as is correct and presents a somewhat esoteric case of deadlock: the resource hash collision. The lock manager in SQL Server doesn’t know what it locks, it just locks ‘resources’ (basically strings). It is the job of higher level components like the the access methods of the storage engine to present the ‘resource’ to the lock manager and ask for the desired lock. When locking rows in a heap or a b-tree the storage engine will synthesize a ‘resource’ from the record identifier. Since these resources have a limited length, the storage engine has to reduce the effective length of a key to the maximum length is allowed to present to the lock manager, and that means that the record’s key will be reduced to 6 bytes. This is achieved by hashing the key into a 6 byte hash value. Nothing spectacular here.

But if you have a table with a key of length 50 bytes and its reduced to 6 bytes, you may hit a collision. So how likely is this to happen?

On 6 bytes there are 281,474,976,710,656 distinct possible values. Its a pretty big number? Not that big actually. If we meet at a party and I say ‘I bet somebody in the room shares your birthday’ would you bet against me? You probably should :) What if I change my question to ‘I bet there are two people in this room that share the birthday!’? Now I will probably take your money. This is called a ‘meet-in-the-middle‘ attack in cryptography and basically it says that you get a 50% collision probability at half the hash length. So the SQL %%lockres%% hash will produce two records with same hash, with a 50% probability, out of the table, any table, of only 16,777,215 record. That suddenly doesn’t look like a cosmic constant, does it? And keep in mind that this is the absolutely maximum value, where the key has a perfectly random distribution. In reality keys are not that random after all. Take Jame’s example: a datetime (8 bytes), a country code (1 byte), a group code (2 bytes) and a random id (4 bytes). From these 15 bytes quite a few are actually constant: eg. every date between 2000 and 2010 has the first 4 bytes identical (0×00) and the 5th byte only has two possible values (0×08 or 0×09). If from the other codes (country, group) we use only 50% of the possible values, then in effect we use, generously, just 10 bytes of the 15 bytes of the key. This means the table has a 50% collision probability at only about 11 million records. considering that he was doing a ‘paltry’ 900 million records upload, no wonder he got collisions…

Have you met  ? Say hello to my BOM

May 21st, 2009

I recently had to look at a problem where a programmer writing a tool to parse some XML produced by a service written by me was complaining that my service serves invalid XML. All my documents seemed mangled and started with . I couldn’t help but smile. So what is this mysterious sequence ? Well, lets check the ISO-8859-1 character encoding, commonly known as ‘Latin alphabet’ and often confused with the Windows code page 1252: ï is the character corresponding to 0xEF (decimal 239), » is 0xBB (decimal 187) and ¿ is 0xBF (decimal 191). So the mysterious sequence is 0xEFBBBF. Does it look familiar now? It should, this is the UTF-8 Byte Order Mark. Moral: if you consume and parse XML, make sure you consume it as XML, not as text. All XML libraries I know of correctly understand and parse the BOM. The only problems I’ve seen are from hand written ‘parsers’ that treat XML as a string (and most often fail to accommodate namespaces too…).

stackoverflow.com: how to execute well on a good idea

May 18th, 2009

Some time ago I started noticing on my google searches a newcomer: stackoverflow.com. At first I dismissed this as yet another SEO hack to divert traffic to some re-syndicated content of the old user groups and forums. But I was wrong. Turns out stackoverflow.com is an enterprise backed by well known industry names like Joel Spolsky of Joel on Software fame. Apparently I’ve been living in a cave, since this new site is quite popular and even doing a Stack Overflow DevDays tour!

Now being that I’m a forums and newsgroup addict with a long history of MSDN abuse, I had to join this one and start showing off my amazing knowledge and wit. OK, you can all stop laughing now. I am a noob, so what? I still love to answer questions ;) and not actually knowing the answer has never a stop for me. Imagination is more important than knowledge!

I spend now about 4 days on stackoverflow.com and I must say that I’m impressed. First of all, they do offer innovation in how the content is gathered and presented. The hierarchical forums model was obviously obsolete and it was showing its age: new users have a hard time figuring out which forum is the right one, monitoring the questions is difficult as the volume increases, there are often questions that obviously span multiple topics and picking the one forum for it severely restricts the exposure of the said question, and implicitly the quality of responses. Instead stackoverflow.com goes for a tags based model. When you ask a question you choose tags relevant for it and you can mix and match tags as diverse as linq, objective-c and php in one single question.

Now tags based contents isn’t exactly new, but the way is executed on stackoverflow.com takes it to a new level. Of course they offer tags based browsing of topics. But they also keep track of the tags you must often interact with (ask or answer). You can browse tags within current tags to get the questions that cover multiple tags of your interest. And the tags system is completely open, anyone can create new tags and they even have awards for successful tags.

The next innovation idea I like is how they try to blend the line between wiki and forums. Topics that prove to be popular and have a good answer can be promoted to wiki entries. This makes the entry serve the same reference role the ‘sticky’ posts serve in forums, but with better functionality. Also rooted in wikis (and craigslist too) is the idea of member provided social policing for the content: answers get voted up or down by community members. Not only that, but questions also get to be voted up or down, which is something I have not seen elsewhere. And ultimately questions can be closed, responses deleted. How is this different from the forum administrators? These people are not administrators, are just ordinary community members. You gain reputation, you earn privileges.

The reputation system is not new, by now almost any community forum has a reputation points system in place. But with stackoverflow.com they added also a system of badges that I feel comes straight from the video games world of achievements and vanity awards : you get bronze, silver or gold badges for achieving tasks in the stackoverflow.com ecosystem. You get your Teacher badge for answering a question and receiving an Up vote, you get the Student badge for answering a question that receives an up vote, or even a gold badge of Great Answer if it gets voted up 100 times. Now these are, of course, vanity awards. We all know though how efficient they are in keeping users hooked in! Me, I’m eager to get my Critic badge…

The only serious thing missing is the RSS syndication of views, but I hear is in the plans.

But most impressive is the quality of execution on these good ideas. The site is fast and responsive. It provides suggestion to similar questions as you type yours. It provides fast navigation to your questions and answers. Visual notifications for changes since your previous check. Suggestions for related topics (ie. common tags). Is true that I don’t know how many users it carries. Judging from the ~30K Teacher badges, I’d guess some 50k users registered and active, as a conservative estimate.

Is also nice to see such an effort started from a grassroots movement, and not from the political sponsorship of an industry player. Today’s developer has to deal in the course of a single day with an NSConnection question, related to an issue of Appache .htaccess mod_rewrite and PHP cookie handling and resulting in a SQL Server access problem. A site like the Social on MSDN would not happily sponsor and encourage such questions, nor would it nurture and grow the community leaders that can answer such end-to-end and cross platform questions.

Read/Write deadlock

May 16th, 2009

How does a simple SELECT deadlock with an UPDATE? Surprisingly, they can deadlock even on well tuned systems that does not do spurious table scans. The answer is very simple: when the read and the write use two distinct access paths to reach the same key and they use them in reverse order. Lets consider a simple example: we have a table with a clustered index and an non-clustered index. The reader (T1) seeks a key in the non-clustered index and then needs to look up the clustered index to retrieve an additional column required by the SELECT projection list. The writer (T2) is updating the clustered index and then needs to perform an index maintenance operation on the non-clustered index. So T1 holds an S lock for the key K on the non-clustered index and wants an S lock on the same key K on the clustered index. T2 has an X lock on the key K on the clustered index and wants an X lock on same key K on the non-clustered index. Deadlock, T1 will be chosen as a victim. So you see, there are no complex queries involved, no suboptimal scan operations, no lock escalation nor page locks involved. Simple, correctly written queries may deadlock if doing read/write operations on the same key on a table with two indexes. Lets show this in an example:

use tempdb;
go

create table TestDeadlock (TestId int identity(1,1),
		constraint cdxTestDealockTestId primary key clustered (TestId),
	ColA varchar(10) not null,
	ColB varchar(10) not null);
go

create index idxTestDeadlockColA on TestDeadlock(ColA);
go

Now lets populate the table with some random data:

set nocount on;
insert into TestDeadlock (ColA, ColB) values ('A', 'B');

declare @i int;
select @i = 0;
while @i < 10000
begin
	insert into TestDeadlock (ColA, ColB) values (rand()*99999, rand()*99999);
	select @i = @i+1;
end
go

Now open two query windows, and on the first one start this loop (the Reader):

set nocount on;
while (1=1)
begin
	declare @B varchar(10);
	select @B = ColB from TestDeadlock where ColA='A';
end

On the second one, start this loop (the Writer):

set nocount on;
while (1=1)
begin
	update TestDeadlock set ColA='A' where TestId=1;
	update TestDeadlock set ColA='B' where TestId=1;
end

Now switch back to the first query window and there you have it:

Msg 1205, Level 13, State 51, Line 6
Transaction (Process ID 52) was deadlocked on lock resources with another process
and has been chosen as the deadlock victim. Rerun the transaction.

Looking at the deadlock graph will show exactly the situation I described:

SPID 52 has an KEY S lock on idxTestDeadlockColA and wants an KEY S lock on cdxTestDeadlockTestId, SPID 53 has the KEY X lock on cdxTestDeadlockTestId and wants a KEY X lock on dxTestDeadlockColA. SPID 52 is chosen as victim since it performed less writes than SPID 53.

Why is this kind of deadlock not more prevalent? The key ingredient is that both T1 and T2 have to go after the same ‘key’. On a real life situation, this is just a matter of probabilities. If the write application pattern is to write new keys and to look up old ones, it is unlikely to happen. If the number of keys is very large then the probability of overlap is low (of course it increases around ‘hot spots’, for example a hot post on your website). In my recent experience on of the main culprits for these deadlocks are the hit counters prevalent in many content management systems (increment number of hits a page or image was shown).

Unfortunately things are not always so easy to understand and to explain. The same problem can manifest itself in the disguise of a join. The key learning to take home is this: neither the reads nor the writes do not acquire locks in an atomic fashion on all indexes of a table (that would be impossible, of course!). Whenever two transactions acquire these locks in a different order from one another, they open themselves to a deadlock.

The proper fix varies, of course. Things to consider are:

  • eliminate unnecessary columns from reader’s projection so he does not have to look up the clustered index
  • add required columns as contained columns to the non-clustered index to make the index covering, again so that the reader does not have look up the clustered index
  • avoid updates that have to maintain the non-clustered index
  • One final note: why did I add 10000 dummy record in my example? Because otherwise the optimizer would see that the clustered index has only one page and would choose a plan that scans this index instead of seeking the non-clustered index.

Version Control and your Database

May 15th, 2009

I am still amazed when I walk into a development shop and I ask for their application database script and they offer to extract one for me. Really, your only definition of the database is the database itself? Now you wouldn’t keep your libraries as object code only and reverse engineer them every time you want to make a change, would you?

Now, all sarcasm aside, why is so hard to keep a database definition as source and keep it under version control? The reason is not that people are dumb, these are bright developers and they would do the right thing if it would fit into their natural work flow. The problem is that the tool set at their disposal as developers (usually the Visual Studio suite) is far far behind the capabilities of the database administration tool set (the SSMS). But the later is focused for the needs of administrators and the natural flow of actions is to visually modify some schema properties (add tables, define indexes etc) in a dialog and then click the ‘Do it!’ button. This hides actual scripts going on behind the scenes and does not lend itself naturally to the normal code/build/run/test/commit cycle of the developer desk.

In my development persona I have become acutely aware of the pitfalls of not having sources for my database objects and catalog reference data and not having the benefits of source control versioning. My answer to this problem was to rely on an extended database property (usually named after the application, eg. “MyApplication DB Version”) to keep track of the current deployed application schema. Every database schema modification is a version change that is achieved by running a specific script, including the initial deployment from v. NULL to v. 1.0 . First thing I check the current version:

SELECT [value] 
    from ::fn_listextendedproperty (
        'MyApplication DB Version', 
        default, default, default, default, default, default);

The application contains a static array of versions and scripts. After it retrieves the current version, it iterates the array until it finds the retrieved version and then runs the script. This script will upgrade the database to the next version, and all the scripts end by setting the new version:

EXEC sp_updateextendedproperty 
	@name = N'MyApplication DB Version', @value = '1.2';
GO

The application then iterates again until the final version is reached.

The main advantage of this approach is that my deployment script is easy to create and test, and is not peppered with those nasty ‘IF EXISTS(…)’ conditional statements. This scripts become immutable once a version is rolled out. If schema changes then the version is rolled forward and a new script is added with the necessary steps to alter the database to the new schema. The new script is easy to test and hammer into correctness. Start from the current version backup, run the script, fix any problems, then roll back again to the current version backup, run it again until all problems are fixed.

Note that there is no script to deploy directly the current version. A new deployment will deploy by running all the scripts in a row, from v. 0 to v. 1, then v. 1.1, v. 1.2 and so on and so forth. This means that objects in the schema will go trough all their lifetime variations: tables will be created with missing columns then later the column will be added at the v. 1.1 script, stored procedures will go through several incarnations as they are ALTERed into the current final form, reference data will be added then updated or deleted as the application has evolved. This makes for a strange experience for someone monitoring the application deployment, but it certainly pays off from a source maintainability point of view. Also your application can be deployed safely on top of any previous version, as probably your clients will not be all at the last version.

The one major disadvantage of my approach is the lack of a current reference definition of the schema in source. The schema is the result of all the transformations from v. 0 to current version and one cannot simply go into the schema script and see the current definition of any object.

Visual Studio Database Edition

Last Wednesday I was at the Pacific Northwest PASS meeting on the MS campus and Barclay Hill presented the newest additions to the VS Database Edition line. I was really impressed. The new functionality supports definition of database schema stored in Transact-SQL scripts (your familiar CREATE statements) and thus fully integrates into source version control, be it via VSTS, AnkhSVN or any SCM of your choice. VS can nou produce a new type of file during the build of your project, a .dbschema file. This is a deployment file containing the definition of your project schema. Coupled with the VSDBCMD command line tool one can take this .dbschema to the customer site, run it and the tool will analyze the current deployed schema, compare it with the .dbschema definition, produce a delta and apply this delta transformation to the database. The tool supports every object in SQL 2008, both at database and instance level. It does not though support replication nor SQL Agent objects, and I think that is fine since they are really tied to deployment site specifics, not to development time.

The fact that the project schema definitions are pure T-SQL and the source control integration means you can setup MSBUILD to produce continuous integration builds and drop a build with the current trunk or branch, including the current .dbschema file. You can then take this .dbschema from the drop location and have it tested at your customer site. Or you can have the build process deploy the project, implicitly running the .dbschema and apply it to your development test database. You can even go fancy and automate the deployment into the staging servers for integration testing. And since MSBUILD supports adding BVTs (build validation tests) to your build drop process, you can also automate that part. I would shy though from going the whole nine yards and have the build process apply the new build on production server, for obvious reasons :) . But perhaps this is not so outrageous, specially for shops that develop around one and only one deployment, as is usually the case with web sites. Careful branch management can stage the deployment from development to test, integration and finally production.

Also a good news coming is that the next Visual Studio Database Edition is actually going to be integrated in the Developer Edition. This is really great, since a lot of shops really need this functionality but would not be willing to pay extra bucks for a separate Database Edition.

If you like to play with this awesome new features I recommend you download the latest Visual Studio Team Systems 2008 Database Edition GDR. Gert has a couple of useful links to the download location, manual, some setup instructions: http://blogs.msdn.com/gertd/archive/2008/10/27/the-gdr-rc-is-here.aspx

Birds of a Feather Tweet Together

May 10th, 2009

@rusanu: I am on Twitter now. Yes, I am another victim of the totally connected social dragon. Since we’re on the subject here is my LinkedIn profile: http://www.linkedin.com/in/remusrusanu.