Have you met  ? Say hello to my BOM

May 21st, 2009

I recently had to look at a problem where a programmer writing a tool to parse some XML produced by a service written by me was complaining that my service serves invalid XML. All my documents seemed mangled and started with . I couldn’t help but smile. So what is this mysterious sequence ? Well, lets check the ISO-8859-1 character encoding, commonly known as ‘Latin alphabet’ and often confused with the Windows code page 1252: ï is the character corresponding to 0xEF (decimal 239), » is 0xBB (decimal 187) and ¿ is 0xBF (decimal 191). So the mysterious sequence is 0xEFBBBF. Does it look familiar now? It should, this is the UTF-8 Byte Order Mark. Moral: if you consume and parse XML, make sure you consume it as XML, not as text. All XML libraries I know of correctly understand and parse the BOM. The only problems I’ve seen are from hand written ‘parsers’ that treat XML as a string (and most often fail to accommodate namespaces too…).

stackoverflow.com: how to execute well on a good idea

May 18th, 2009

Some time ago I started noticing on my google searches a newcomer: stackoverflow.com. At first I dismissed this as yet another SEO hack to divert traffic to some re-syndicated content of the old user groups and forums. But I was wrong. Turns out stackoverflow.com is an enterprise backed by well known industry names like Joel Spolsky of Joel on Software fame. Apparently I’ve been living in a cave, since this new site is quite popular and even doing a Stack Overflow DevDays tour!

Now being that I’m a forums and newsgroup addict with a long history of MSDN abuse, I had to join this one and start showing off my amazing knowledge and wit. OK, you can all stop laughing now. I am a noob, so what? I still love to answer questions 😉 and not actually knowing the answer has never a stop for me. Imagination is more important than knowledge!

I spend now about 4 days on stackoverflow.com and I must say that I’m impressed. First of all, they do offer innovation in how the content is gathered and presented. The hierarchical forums model was obviously obsolete and it was showing its age: new users have a hard time figuring out which forum is the right one, monitoring the questions is difficult as the volume increases, there are often questions that obviously span multiple topics and picking the one forum for it severely restricts the exposure of the said question, and implicitly the quality of responses. Instead stackoverflow.com goes for a tags based model. When you ask a question you choose tags relevant for it and you can mix and match tags as diverse as linq, objective-c and php in one single question.

Now tags based contents isn’t exactly new, but the way is executed on stackoverflow.com takes it to a new level. Of course they offer tags based browsing of topics. But they also keep track of the tags you must often interact with (ask or answer). You can browse tags within current tags to get the questions that cover multiple tags of your interest. And the tags system is completely open, anyone can create new tags and they even have awards for successful tags.

The next innovation idea I like is how they try to blend the line between wiki and forums. Topics that prove to be popular and have a good answer can be promoted to wiki entries. This makes the entry serve the same reference role the ‘sticky’ posts serve in forums, but with better functionality. Also rooted in wikis (and craigslist too) is the idea of member provided social policing for the content: answers get voted up or down by community members. Not only that, but questions also get to be voted up or down, which is something I have not seen elsewhere. And ultimately questions can be closed, responses deleted. How is this different from the forum administrators? These people are not administrators, are just ordinary community members. You gain reputation, you earn privileges.

The reputation system is not new, by now almost any community forum has a reputation points system in place. But with stackoverflow.com they added also a system of badges that I feel comes straight from the video games world of achievements and vanity awards : you get bronze, silver or gold badges for achieving tasks in the stackoverflow.com ecosystem. You get your Teacher badge for answering a question and receiving an Up vote, you get the Student badge for answering a question that receives an up vote, or even a gold badge of Great Answer if it gets voted up 100 times. Now these are, of course, vanity awards. We all know though how efficient they are in keeping users hooked in! Me, I’m eager to get my Critic badge…

The only serious thing missing is the RSS syndication of views, but I hear is in the plans.

But most impressive is the quality of execution on these good ideas. The site is fast and responsive. It provides suggestion to similar questions as you type yours. It provides fast navigation to your questions and answers. Visual notifications for changes since your previous check. Suggestions for related topics (ie. common tags). Is true that I don’t know how many users it carries. Judging from the ~30K Teacher badges, I’d guess some 50k users registered and active, as a conservative estimate.

Is also nice to see such an effort started from a grassroots movement, and not from the political sponsorship of an industry player. Today’s developer has to deal in the course of a single day with an NSConnection question, related to an issue of Appache .htaccess mod_rewrite and PHP cookie handling and resulting in a SQL Server access problem. A site like the Social on MSDN would not happily sponsor and encourage such questions, nor would it nurture and grow the community leaders that can answer such end-to-end and cross platform questions.

Read/Write deadlock

May 16th, 2009

How does a simple SELECT deadlock with an UPDATE? Surprisingly, they can deadlock even on well tuned systems that does not do spurious table scans. The answer is very simple: when the read and the write use two distinct access paths to reach the same key and they use them in reverse order. Lets consider a simple example: we have a table with a clustered index and an non-clustered index. The reader (T1) seeks a key in the non-clustered index and then needs to look up the clustered index to retrieve an additional column required by the SELECT projection list. The writer (T2) is updating the clustered index and then needs to perform an index maintenance operation on the non-clustered index. So T1 holds an S lock for the key K on the non-clustered index and wants an S lock on the same key K on the clustered index. T2 has an X lock on the key K on the clustered index and wants an X lock on same key K on the non-clustered index. Deadlock, T1 will be chosen as a victim. So you see, there are no complex queries involved, no suboptimal scan operations, no lock escalation nor page locks involved. Simple, correctly written queries may deadlock if doing read/write operations on the same key on a table with two indexes. Lets show this in an example:

Read the rest of this entry »

Version Control and your Database

May 15th, 2009

I am still amazed when I walk into a development shop and I ask for their application database script and they offer to extract one for me. Really, your only definition of the database is the database itself? Now you wouldn’t keep your libraries as object code only and reverse engineer them every time you want to make a change, would you?

Now, all sarcasm aside, why is so hard to keep a database definition as source and keep it under version control? The reason is not that people are dumb, these are bright developers and they would do the right thing if it would fit into their natural work flow. The problem is that the tool set at their disposal as developers (usually the Visual Studio suite) is far far behind the capabilities of the database administration tool set (the SSMS). But the later is focused for the needs of administrators and the natural flow of actions is to visually modify some schema properties (add tables, define indexes etc) in a dialog and then click the ‘Do it!’ button. This hides actual scripts going on behind the scenes and does not lend itself naturally to the normal code/build/run/test/commit cycle of the developer desk.

Read the rest of this entry »

A fix for error Cannot find the remote service SqlQueryNotificationService-GUID

April 18th, 2009

Sometimes your ERRORLOG is peppered with messages complaining about the service SqlQueryNotificationService-<guid> not existing or query notification dialogs being closed because they received an error message with the text Remote service has been dropped. I have blogged about this problem before: http://rusanu.com/2007/11/10/when-it-rains-it-pours/. Unfortunately this problem was not under your control as an administrator nor as a developer. It is caused by the way the SqlDependency component of ADO.Net deploys the temporary service, queue and procedure needed for its functioning. The problem could be caused by your application calling SqlDependency.Stop inadvertently but also by simple timing problems: http://rusanu.com/2008/01/04/sqldependencyonchange-callback-timing/.

Good news: Microsoft has shipped a fix for this issue: http://support.microsoft.com/kb/958006. According to the knowledge base article you need to install the following Cumulative Update depending on your current version of SQL Server deployed:

  • For SQL Server 2005 SP2 you need CU 10.
  • For SQL Server 2005 SP3 you need CU 1.
  • For SQL Server 2008 you need CU 2.

If you have SQL Server 2008 SP1 deployed you do not need to install any fix because the issue is fixed in SP1 for 2008.

Using XSLT to generate Performance Counters code

April 11th, 2009

Whenever I’m faced with a project in which I have to create a lot of tedious and repeating code I turn to the power of XML and XSLT. Rather than copy/paste the same code over and over again, just to end up with a refactoring and maintenance nightmare, I create an XML definition file and an XSLT transformation. I am then free to add new elements to the XML definition or to change the way the final code is generated from the XSLT transformation. This can be fully integrated with Visual Studio so that the code generation happens at project build time and the environment shows the generated code as a dependency of the XML definition file.

A few examples of how I’m using this code generation via XSLT are:

Data Access Layer
I know this will raise quite a few eyebrows, but I always write my own data access layer from scratch and is generated via XSLT.
Performance Counters
I create all my performance counters objects via XSLT generation, automating the process of defining/installing them and the access to emit and consume the counter values.
Wire Frames
In any project that has networking communication I access the wire format from classes generated via XSLT that take care of serialization and validation.

For example I’ll show how to create a class library that can be added to your project to expose Performance Counters from your application.

Read the rest of this entry »

Service Broker Whitepaper on MSDN: the 150 trick

March 25th, 2009

A new SQL Customer Advisory Team whitepaper was published recently: Service Broker: Performance and Scalability Techniques authored by Michael Thomassy.

The whitepaper documents the experience of a test done in Microsoft labs that measured the message throughput attainable between three initiators pushing data to a target. This scenario resembles a high scale ETL case. The test was able to obtain a rate a nearly 18000 messages per second, which is a rate that can satisfy most high load OLTP environments. To obtain this rate Michael and his team had to overcome the high contention around updates of the dialogs system tables. He presents a very interesting trick: create 149 dialogs that remain unused and only use every 150th. This way the updates done on the system tables occur on different pages and the high PAGELATCH contention on the page containing the dialog metadata during SEND is eliminated. A very clever trick indeed. But this is a typical OLTP insert/update trick and that is the very point of the whitetpaper: that typical OLTP techniques can and should be applied to Service Broker performance tuning.

DatabaseJournal Tutorial

March 24th, 2009

Marcin Policht has concluded his series of articles dedicated to Service Broker in the Database Journal. Although the articles are part of a larger SQL Express coverage, they are not at all specific just to Express. I highly recommend them as a very good introduction to everything Service Broker on SQL Server 2005.

  1. Introduction to Service Broker
  2. Implementing Basic Service Broker Objects
  3. Implementing Service Broker Conversation – I
  4. Implementing Service Broker Conversation – II
  5. Distributed Service Broker Environment – Endpoints
  6. Distributed Service Broker Environment – Routing
  7. Distributed Service Broker Environment – Conducting Dialogs
  8. Configuring Certificate-based Authentication in SQL Server Express’ Distributed Service Broker Environment
  9. Establishing Distributed SQL Server Express’ Service Broker Conversations Using Certificate-based Authentication
  10. Configuring Transport Encryption in SQL Server 2005 Express Service Broker Conversation
  11. Configuring Full Dialog Security in SQL Server 2005 Express Service Broker Conversation
  12. Conducting Service Broker Conversation Using Full Dialog Security in SQL Server 2005 Express Service
  13. Configuring Anonymous Dialog Security in SQL Server 2005 Express Service Broker Conversation
  14. Service Broker Activation in SQL Server 2005 Express Edition
  15. Security Context of Service Broker Internal Activation
  16. Service Broker Transactional Support in SQL Server 2005 Express Edition
  17. Service Broker Poison Message Handling

Things I know now: blogging can get you into a email ponzi scheme

March 20th, 2009

I got tagged by Adam Machanic. Although my blog looks like a DBA blog, I am a pure breed developer, so here is what I know now:

You can’t fix what you can’t measure

Successful projects dedicate quite a large amount of resources to instrumentation, profiling and monitoring. Anywhere between 5 and 10% of the used resources should be an acceptable margin to run code that generates logs, reports performance counters, monitors responsiveness, aggregates and consolidates run time data. I cannot stress the importance of properly instrumenting your code to support this. It has been said before that any optimization or troubleshooting should start from measurements, not from guesswork. Your duty is to put those measurements in place, expose information your users can rely on in order to make informed decisions. Don’t cut corners and eat your dog food: use the instrumentation you added to troubleshoot problems, don’t fire up a debugger and start stepping through the sources. If you cannot figure out the cause of a problem from the tracing and logs, your customers won’t be able to do it either. Nor will you be able to troubleshoot on site at a customer deployment.

Read the rest of this entry »

CLR Memory Leak

January 19th, 2009

On a recent client engagement I had to investigate what appeared a to be a memory leak in a managed application. The program was running for about a week now and it appeared to slowly degrade in performance over time. Although it appeared healthy at only about 245 MB of memory used, I decided to investigate. The fastest way in my opinion to track down leaks in a running environment is to attach Windbg and use SOS:

Read the rest of this entry »