Error Handling in Service Broker procedures

October 31, 2007

Error handling in T-SQL traditionally has been a sort of misterious voo-doo for most developers, with it’s mixture of error severities, SET settings like <a _\_designer:dtid="844424930131970" href="http://msdn2.microsoft.com/en-us/library/ms188792.aspx">XACT\_ABORT</a>, ARITHABORT or ARITHIGNORE and the options to handle the error on the server or at the client. For a long time now the best resource I know on this subject was, and perhaps still is, Erland Sommarskog set of articles at http://www.sommarskog.se/error-handling-I.html and http://www.sommarskog.se/error-handling-II.html. But the introduction of Service Broker activated procedures adds some new issues to consider when designing your application and this post is about what these issues are these and how to best cope with them.

@@ERROR or TRY…CATCH?

Well first things first, lets get this out of our way: check @@error or use BEGIN TRY… BEGIN CATCH? In my opinion, the advantages offered by the TRY… CATCH blocks are overwhelming and you should steer away from checking @@ERROR after each statement. Even if you may find some immediate scenario that would seem simpler using an explicit check for @@error, writing code using TRY…CATCH blocks pays of on the long run. For me the most important differentiatin factor between the two is actually the human factor: code writen with @@error checks is difficult to read and impossible to maintain. Keeping the discipline in writing code to check for @@error after each statement is tedious and errorprone. @@error checks are not ‘<a _\_designer:dtid="844424930131980" href="http://www.amazon.com/Effective-C%2B%2B-Addison-Wesley-Professional-Computing/dp/0321334876/ref=pd\_bbs\_sr\_1/002-9386102-7810448?ie=UTF8&s=books&qid=1193824559&sr=8-1">programming in the future tense</a>‘ as Scott Meyers would put it: future versions in the database engine could introduce behavior changes that raise new errors your application is not prepared to deal with. The TRY…CATCH blocks solve so many of these issues that they are clearly the winner in my book.

Service Broker Error Handling

Why is Service Broker error handling something special? Shouldn’t the normal common wisdom of writing Transact-SQL apply to Service Broker as well? Well yes, but as it turns out there is a twist to the story. Service Broker is an integrated part of the SQL Server database engine and it follows all the normal Transact-SQL language parsing, compilation and execution rules like any other database engine componet, including everything related to errors and exception handling. But the typical Service Broker application is deployed in a quite different fashion than the non-broker applications: it’s an activated procedure that runs in the server background threads. The first major difference is that there is no client application to surface the error to, so any error that happens will have to be dealt with by the procedure. This means that the very purpose of error handling in a Service Broker application is different than in an ordinary procedure: Service Broker error handling has to capture any error and somehow react to this error at the level of Service Broker semantics, namely conversations. Most times this means that the conversation that received the message that triggered the error has to be ended with an error so that the partener service is notified about this problem and, sometimes, some compensation logic has to be run to undo the effects of this conversation had so far. The partener will receive the error message and in turn run its own compensation to complete (with error) the business transaction represented by the conversation..

Lets roll a quick example. Supose we have a trivial service whose role is to insert received message content into a target table (similar to what an audit service perhaps would do). Lets go ahead and build our DDL objects:

use tempdb;

— A table for the purpose of the demo

create table [target_table] (

[id] int not null primary key,

[key] nvarchar(256),

[value] nvarchar(256));

— Our initiator service

create queue [sender_queue];

create service [sender_service]

on queue [sender_queue];

— Our target service

create queue [target_queue];

create service [target_service]

on queue [target_queue]

([DEFAULT]);

create procedure [usp_target] as begin set nocount on; declare @handle uniqueidentifier; declare @messageType sysname; declare @payload xml; begin try begin transaction; waitfor (receive top (1) @handle = conversation_handle, @messageType = message_type_name, @payload = cast(message_body as xml) from [target_queue]), timeout 1000; if (0 != @@rowcount) begin if (@messageType = ‘DEFAULT’) begin insert into [target_table] ( [id], [key], [value]) select e.value(‘@id’, ‘int’) as [id], e.value(‘@key’, ‘nvarchar(256)’) as [key], e.value(‘@value’, ‘nvarchar(256)’) as [value] from @payload.nodes(‘//payload/entry’) p(e); end; end conversation @handle; end commit; end try begin catch declare @error int, @message nvarchar(4000); select @error = ERROR_NUMBER(), @message = ERROR_MESSAGE(); end conversation @handle with error = @error description = @message; commit; end catch; end go

— attach the procedure to the target_queue as an activated procedure alter queue [target_queue] with activation ( status = on, max_queue_readers = 1, procedure_name = [usp_target], execute as owner); go As you can see, the error handling in the stored procedure is handled by a BEGIN TRY/BEGIN CATCH block, and the error handling consist of ending the conversation with an error, sending back the error number and message to the original sender. Of course, in a real application it would probably make sense to clearly define error numbers and messages reported by the target service, as is not really useful to send back to the sender some internal database engine codes and messages. So now lets go ahead and test our service: — send a message to the target service declare @handle uniqueidentifier, @payload xml; begin transaction; begin dialog conversation @handle from service [sender_service] to service ‘target_service’, ‘current database’ with encryption = off; select @payload = N‘<entry id=”1″ key=”A” value=”B”/>’; send on conversation @handle (@payload); commit; go If we look into the target_table we find the entry we sent (1, A, B). The response that came back from the target service is sitting in the sender_queue and is a mundane EndDialog message, indicating the succesfull completion of the ‘assignment’. Now if we simply run the very same SEND example, we’re gonna trigger an error, because the same values would violate the primary key constraint on the target_table. So go ahead and run this again: — send a message to the target service declare @handle uniqueidentifier, @payload xml; begin transaction; begin dialog conversation @handle from service [sender_service] to service ‘target_service’, ‘current database’ with encryption = off; select @payload = N‘<entry id=”1″ key=”A” value=”B”/>’; send on conversation @handle (@payload); commit; go Now the sender_queue has received an error message: «/span>Error xmlns=“http://schemas.microsoft.com/SQL/ServiceBroker/Error”> «/span>Code>2627</Code> «/span>Description>Violation of PRIMARY KEY constraint ‘PK__target_table__0AD2A005’. Cannot insert duplicate key in object ‘dbo.target_table’.</Description> </Error> So our error handling works as we expected and the service is handling error conditions fine. So is this the end of the story? <h2 __designer:dtid="844424930132684">Doomed Transactions</h2> As it turns out, we’ve only skimmed the surface and we’re about to open one nasty can of worms. To proove this lets produce the one most common mistake that happens during development: a malformed payload: — send a message to the target service declare @handle uniqueidentifier, @payload nvarchar(max); begin transaction; begin dialog conversation @handle from service [sender_service] to service ‘target_service’, ‘current database’ with encryption = off; select @payload = N‘<entry id=”1″ key=”A” value=”B”>’; send on conversation @handle (@payload); commit; go The payload in this case is incorrect, the tag is not closed. So what happens with our message? Nothing apparently, the message simply stays in the target\_queue and refuses to be processed, as a simple check using select * from [target_queue] shows! The obvious culprit should be that message failed to be processed 5 times and the poison message detection has intervened and deactivated the target_queue so lets go ahead and check it: select [name], is\_receive\_enabled from sys.service\_queues Now I reckon that as I was writing this piece, I had a big surprise to see that the queue is … not disabled! In fact I had to delete a paragraph I had already wrote, and come back and revisit this behavior. So what happened? It turns out that the activation mechanism was actualy fooled by our erroneous message to put the queue in the NOTIFIED state, and leave it as such, as a check on the relevant DMV shows: select * from sys.dm\_broker\_queue\_monitors. The NOTIFIED state occurs when an activated procedure was launched but the RECEIVE verb has not been run on the activated queue. But our activated procedure was not running, as a quick check on select * from sys.dm\_broker\_activated\_tasks shows. A second check on the ERRORLOG (or the system event viewer) revealed the problem: <blockquote __designer:dtid="844424930132783">

<tt __designer:dtid=”844424930132784”>2007-10-31 16:31:52.57 spid52s The activated proc [dbo].[usp_target] running on queue tempdb.dbo.target_queue output the following: ‘The current transaction cannot be committed and cannot support operations that write to the log file. Roll back the transaction.’</tt> </blockquote> So our procedure wad run, had hit an exception, but our exception handler was not capable of dealing with the problem. The error in the ERRORLOG indicates that we are dealing with an uncommittable transaction issue. So apparently we need to augment our error handling CATCH block to deal with such problems, and the <a __designer:dtid=”844424930132787” href=”http://msdn2.microsoft.com/en-us/library/ms189797.aspx”>XACT_STATE</a> documentation shows how this can be achieved. So we’re gonna modify our CATCH block to deal with doomed transactions and we’re also gonna modify our RECEIVE code slightly not to cast to XML during the RECEIVE statement, since that apparently is fooling the activation machinery which behaves as if the RECEIVE never occured. Here is our modified procedure: — Our modified procedure

alter procedure [usp_target] as begin set nocount on; declare @handle uniqueidentifier; declare @messageType sysname; declare @messageBody varbinary(max); declare @payload xml; begin try begin transaction; waitfor (receive top (1) @handle = conversation_handle, @messageType = message_type_name, @messageBody = message_body from [target_queue]), timeout 1000; if (0 != @@rowcount) begin if (@messageType = ‘DEFAULT’) begin select @payload = cast(@messageBody as xml); insert into [target_table] ( [id], [key], [value]) select e.value(‘@id’, ‘int’) as [id], e.value(‘@key’, ‘nvarchar(256)’) as [key], e.value(‘@value’, ‘nvarchar(256)’) as [value] from @payload.nodes(‘//payload/entry’) p(e); end; end conversation @handle; end commit; end try begin catch — Test XACT_STATE for 0, 1, or -1. — If 1, the transaction is committable. — If -1, the transaction is uncommittable and should — be rolled back. — Test whether the transaction is uncommittable. if (XACT_STATE()) = –1 begin rollback transaction; end; — Test wether the transaction is active and valid. if (XACT_STATE()) = 1 begin declare @error int, @message nvarchar(4000); select @error = ERROR_NUMBER(), @message = ERROR_MESSAGE(); end conversation @handle with error = @error description = @message; commit; end end catch; end go We alter the proceure, turn activation back on and… yet again nothig, the message is still sitting in the target_queue. This time though the queue was correctly disabled, we check the select [name], is_receive_enabled from sys.service_queues view and sure enough the queue is disabled. But wait a minute! I’m sure TRY/CATCH has its place, but this is not quite what we wanted to achieve, a simple common XML formatting error is taking down our service! How come? <h2 __designer:dtid="844424930133170">Error severities</h2> As it turns out, XML casting and validation errors are raised with severity 16, and this severity will always put the transaction into an uncommittable state. For Service Broker, this is quite bad, as parsing and shreding XML is a day to day operation for most service applications. Unfortunately, there’s simply no workaround for this issue, which has the implication that one cannot use the SQL Server built-in XML methods to validate the received XML programmatically, one has to ensure that the XML is valid upfront. For XML payloads fortunately there is a solution: declare the message validation on the message type and even enforce an XML schema. This will cause validation to occur prior to the message being enqueued and will ensure that the activate dprocedure has to deal only with valid input. In our case, this means to change the contract of the service to use an XML well formatted validation message: create message type [valid_payload] validation = well_formed_xml; create contract [validated_contract] ([valid_payload] sent by initiator); go alter service [target_service] (add contract [validated_contract]); We also need to change our activated procedure to deal with the new message type name, the ‘valid_payload’ type: … if (0 != @@rowcount) begin if (@messageType = N‘valid_payload’) begin select @payload = cast(@messageBody as xml); insert into [target_table] ( [id], [key], [value]) … We can now reset the broker in tempdb (to get rif of the erroneous message that would prevent the target_queue from ever becomming enabled) and enable back the target_queue: alter database tempdb set new_broker with rollback immediate; go alter queue target_queue with status = on; go So now we can send again the erroneous message, this time using the validated contract and message type: — send a message to the target service declare @handle uniqueidentifier, @payload nvarchar(max); begin transaction; begin dialog conversation @handle from service [sender_service] to service ‘target_service’, ‘current database’ on contract [validated_contract] with encryption = off; select @payload = N‘<entry id=”1″ key=”A” value=”B”>’; send on conversation @handle message type [valid_payload] (@payload); commit; go And, as expected, the Service Broker responds immedeately with an error on our conversation because the message did not pass the XML payload validation: «/span>Error xmlns=“http://schemas.microsoft.com/SQL/ServiceBroker/Error“> «/span>Code>-9615</Code> «/span>Description>A message of type ‘valid_payload’ failed XML validation on the target service. XML parsing: line 1, character 51, end tag does not match start tag This occurred in the message with Conversation ID ‘…’, Initiator: 1, and Message sequence number: 0.</Description> </Error> So we were able to cope with the case of invalid XML, but how about other cases? What if the message is still valid XML, but has some invalid attributes? Ie. the payload ‘<entry id=”foobar” key=”A” value=”B”/>’will pass the well_formed_xml test, but will still cause the service to disable itslef, because ‘foobar’ cannot be casted to int. You could raise the bar even further by specifying an XML schema for validation, but not all of us are so confident in our XML schema knowledge to be sure that we covered everything. And besides, there are binary messages too, XML payload is not a requirement for Service Broker. The truth is that the severity of such trivial errors forcing the transaction to rollback is quite a burden on the application, since it means that pretty much any malformed message can disable a service. And there simply isn’t any buletproof defense against it. <h2 __designer:dtid="844424930133409">Post Rollback corrective actions</h2> Many developers I talked with have been at this point already and realised that they simply cannot cover all the angles on this problem so they tried another approach: what if one rolls back, but then does some corrective action. Surely we could write some code in the catch block that remedies the problem, like dequeue the message and error the conversation immedeately, without trying to process it’s payload. Usually this looks something like the following, int he BEGIN CATCH block: — Test whether the transaction is uncommittable. if (XACT_STATE()) = –1 begin rollback transaction; — take corrective actions declare @error int, @message nvarchar(4000); select @error = ERROR_NUMBER(), @message = ERROR_MESSAGE(); begin transaction; receive top(1) @messageBody = message_body from [target_queue] where conversation_handle = @handle; end conversation with error = @error description = @message; commit; end; But this is riddled with problems. The moment the transaction was rollbed back, the code has absolutely no guarantee over the state of this conversation and its messages. Another instance of the activated procedure might had already picked up the very same message and tried to process it. This problem is true for any processing of this kind, it is not specific to Service Broker, but the typical Service Broker environment, with it’s activated procedures running in parallel and trying to grab the next unlocked message it is very likely to happen in production. What’s worst is that is is unlikely to happen in the test development environement, unless the test environment matches the concurrency of the production environment. <h2 __designer:dtid="844424930133520">Conclusions</h2> So it turns out error handling in Service Broker is a bit trickier than expected and presenting some challenges. Some recommendations stand valid: <ul __designer:dtid="844424930133522"> <li __designer:dtid="844424930133523">Do use the BEGIN TRY/BEGIN CATCH blocks, they are simplifying the code tremendously</li> <li __designer:dtid="844424930133524">Use the Service Broker conversation semantics to represent business transactions, and use the END CONVERSATION … WITH ERROR to signal errors to the other service at the end of a conversation when you catch</li> <li __designer:dtid="844424930133525">Do harden your service broker contract as much as possible using XML validation schemas to catch malformed messages as early as possible, before they enter your application queues</li> </ul> But other thant this, dealing with errors and the problems caused by uncommittable transactions and its effects on Service Broker queues is by no means trivial, and is a subject I actually still looking for an acceptable answer.

Error Handling in Service Broker procedures

@@ERROR or TRY…CATCH?

Service Broker Error Handling

Recent Posts