Error Handling in Service Broker procedures
October 31st, 2007Error handling in T-SQL traditionally has been a sort of misterious voo-doo for most developers, with it’s mixture of error severities, SET settings like XACT_ABORT, ARITHABORT or ARITHIGNORE and the options to handle the error on the server or at the client. For a long time now the best resource I know on this subject was, and perhaps still is, Erland Sommarskog set of articles at http://www.sommarskog.se/error-handling-I.html and http://www.sommarskog.se/error-handling-II.html. But the introduction of Service Broker activated procedures adds some new issues to consider when designing your application and this post is about what these issues are these and how to best cope with them.
@@ERROR or TRY…CATCH?
Well first things first, lets get this out of our way: check @@error or use BEGIN TRY… BEGIN CATCH? In my opinion, the advantages offered by the TRY… CATCH blocks are overwhelming and you should steer away from checking @@ERROR after each statement. Even if you may find some immediate scenario that would seem simpler using an explicit check for @@error, writing code using TRY…CATCH blocks pays of on the long run. For me the most important differentiatin factor between the two is actually the human factor: code writen with @@error checks is difficult to read and impossible to maintain. Keeping the discipline in writing code to check for @@error after each statement is tedious and errorprone. @@error checks are not ‘programming in the future tense‘ as Scott Meyers would put it: future versions in the database engine could introduce behavior changes that raise new errors your application is not prepared to deal with. The TRY…CATCH blocks solve so many of these issues that they are clearly the winner in my book.
Service Broker Error Handling
Why is Service Broker error handling something special? Shouldn’t the normal common wisdom of writing Transact-SQL apply to Service Broker as well? Well yes, but as it turns out there is a twist to the story. Service Broker is an integrated part of the SQL Server database engine and it follows all the normal Transact-SQL language parsing, compilation and execution rules like any other database engine componet, including everything related to errors and exception handling. But the typical Service Broker application is deployed in a quite different fashion than the non-broker applications: it’s an activated procedure that runs in the server background threads. The first major difference is that there is no client application to surface the error to, so any error that happens will have to be dealt with by the procedure. This means that the very purpose of error handling in a Service Broker application is different than in an ordinary procedure: Service Broker error handling has to capture any error and somehow react to this error at the level of Service Broker semantics, namely conversations. Most times this means that the conversation that received the message that triggered the error has to be ended with an error so that the partener service is notified about this problem and, sometimes, some compensation logic has to be run to undo the effects of this conversation had so far. The partener will receive the error message and in turn run its own compensation to complete (with error) the business transaction represented by the conversation..
Lets roll a quick example. Supose we have a trivial service whose role is to insert received message content into a target table (similar to what an audit service perhaps would do). Lets go ahead and build our DDL objects:
use tempdb;
go
— A table for the purpose of the demo
create table [target_table] (
[id] int not null primary key,
[key] nvarchar(256),
[value] nvarchar(256));
go
— Our initiator service
create queue [sender_queue];
create service [sender_service]
on queue [sender_queue];
go
— Our target service
create queue [target_queue];
create service [target_service]
on queue [target_queue]
([DEFAULT]);
go
create procedure [usp_target]
as
begin
set nocount on;
declare @handle uniqueidentifier;
declare @messageType sysname;
declare @payload xml;
begin try
begin transaction;
waitfor (receive top (1)
@handle = conversation_handle,
@messageType = message_type_name,
@payload = cast(message_body as xml)
from [target_queue]), timeout 1000;
if (0 != @@rowcount)
begin
if (@messageType = ‘DEFAULT’)
begin
insert into [target_table] (
[id], [key], [value])
select e.value(‘@id’, ‘int’) as [id],
e.value(‘@key’, ‘nvarchar(256)’) as [key],
e.value(‘@value’, ‘nvarchar(256)’) as [value]
from @payload.nodes(‘//payload/entry’) p(e);
end;
end conversation @handle;
end
commit;
end try
begin catch
declare @error int, @message nvarchar(4000);
select @error = ERROR_NUMBER(), @message = ERROR_MESSAGE();
end conversation @handle with error = @error description = @message;
commit;
end catch;
end
go
— attach the procedure to the target_queue as an activated procedure
alter queue [target_queue]
with activation (
status = on,
max_queue_readers = 1,
procedure_name = [usp_target],
execute as owner);
go
As you can see, the error handling in the stored procedure is handled by a BEGIN TRY/BEGIN CATCH block, and the error handling consist of ending the conversation with an error, sending back the error number and message to the original sender. Of course, in a real application it would probably make sense to clearly define error numbers and messages reported by the target service, as is not really useful to send back to the sender some internal database engine codes and messages. So now lets go ahead and test our service:
— send a message to the target service
declare @handle uniqueidentifier, @payload xml;
begin transaction;
begin dialog conversation @handle
from service [sender_service]
to service ‘target_service’, ‘current database’
with encryption = off;
select @payload = N‘<payload><entry id=”1″ key=”A” value=”B”/></payload>’;
send on conversation @handle (@payload);
commit;
go
If we look into the target_table we find the entry we sent (1, A, B). The response that came back from the target service is sitting in the sender_queue and is a mundane EndDialog message, indicating the succesfull completion of the ‘assignment’.
Now if we simply run the very same SEND example, we’re gonna trigger an error, because the same values would violate the primary key constraint on the target_table. So go ahead and run this again:
— send a message to the target service
declare @handle uniqueidentifier, @payload xml;
begin transaction;
begin dialog conversation @handle
from service [sender_service]
to service ‘target_service’, ‘current database’
with encryption = off;
select @payload = N‘<payload><entry id=”1″ key=”A” value=”B”/></payload>’;
send on conversation @handle (@payload);
commit;
go
Now the sender_queue has received an error message:
<Error xmlns=“http://schemas.microsoft.com/SQL/ServiceBroker/Error”>
<Code>2627</Code>
<Description>Violation of PRIMARY KEY constraint ‘PK__target_table__0AD2A005’. Cannot insert duplicate key in object ‘dbo.target_table’.</Description>
</Error>
So our error handling works as we expected and the service is handling error conditions fine. So is this the end of the story?
Doomed Transactions
As it turns out, we’ve only skimmed the surface and we’re about to open one nasty can of worms. To proove this lets produce the one most common mistake that happens during development: a malformed payload:
— send a message to the target service
declare @handle uniqueidentifier, @payload nvarchar(max);
begin transaction;
begin dialog conversation @handle
from service [sender_service]
to service ‘target_service’, ‘current database’
with encryption = off;
select @payload = N‘<payload><entry id=”1″ key=”A” value=”B”></payload>’;
send on conversation @handle (@payload);
commit;
go
The payload in this case is incorrect, the <entry> tag is not closed. So what happens with our message? Nothing apparently, the message simply stays in the target_queue and refuses to be processed, as a simple check using select * from [target_queue] shows! The obvious culprit should be that message failed to be processed 5 times and the poison message detection has intervened and deactivated the target_queue so lets go ahead and check it:
select [name], is_receive_enabled from sys.service_queues
Now I reckon that as I was writing this piece, I had a big surprise to see that the queue is … not disabled! In fact I had to delete a paragraph I had already wrote, and come back and revisit this behavior. So what happened? It turns out that the activation mechanism was actualy fooled by our erroneous message to put the queue in the NOTIFIED state, and leave it as such, as a check on the relevant DMV shows: select * from sys.dm_broker_queue_monitors. The NOTIFIED state occurs when an activated procedure was launched but the RECEIVE verb has not been run on the activated queue. But our activated procedure was not running, as a quick check on select * from sys.dm_broker_activated_tasks shows. A second check on the ERRORLOG (or the system event viewer) revealed the problem:
2007-10-31 16:31:52.57 spid52s The activated proc [dbo].[usp_target] running on queue tempdb.dbo.target_queue output the following: 'The current transaction cannot be committed and cannot support operations that write to the log file. Roll back the transaction.'
So our procedure wad run, had hit an exception, but our exception handler was not capable of dealing with the problem. The error in the ERRORLOG indicates that we are dealing with an uncommittable transaction issue. So apparently we need to augment our error handling CATCH block to deal with such problems, and the XACT_STATE documentation shows how this can be achieved. So we’re gonna modify our CATCH block to deal with doomed transactions and we’re also gonna modify our RECEIVE code slightly not to cast to XML during the RECEIVE statement, since that apparently is fooling the activation machinery which behaves as if the RECEIVE never occured. Here is our modified procedure:
— Our modified procedure
alter procedure [usp_target]
as
begin
set nocount on;
declare @handle uniqueidentifier;
declare @messageType sysname;
declare @messageBody varbinary(max);
declare @payload xml;
begin try
begin transaction;
waitfor (receive top (1)
@handle = conversation_handle,
@messageType = message_type_name,
@messageBody = message_body
from [target_queue]), timeout 1000;
if (0 != @@rowcount)
begin
if (@messageType = ‘DEFAULT’)
begin
select @payload = cast(@messageBody as xml);
insert into [target_table] (
[id], [key], [value])
select e.value(‘@id’, ‘int’) as [id],
e.value(‘@key’, ‘nvarchar(256)’) as [key],
e.value(‘@value’, ‘nvarchar(256)’) as [value]
from @payload.nodes(‘//payload/entry’) p(e);
end;
end conversation @handle;
end
commit;
end try
begin catch
— Test XACT_STATE for 0, 1, or -1.
— If 1, the transaction is committable.
— If -1, the transaction is uncommittable and should
— be rolled back.
— Test whether the transaction is uncommittable.
if (XACT_STATE()) = –1
begin
rollback transaction;
end;
— Test wether the transaction is active and valid.
if (XACT_STATE()) = 1
begin
declare @error int, @message nvarchar(4000);
select @error = ERROR_NUMBER(), @message = ERROR_MESSAGE();
end conversation @handle with error = @error description = @message;
commit;
end
end catch;
end
go
We alter the proceure, turn activation back on and… yet again nothig, the message is still sitting in the target_queue. This time though the queue was correctly disabled, we check the select [name], is_receive_enabled from sys.service_queues view and sure enough the queue is disabled. But wait a minute! I’m sure TRY/CATCH has its place, but this is not quite what we wanted to achieve, a simple common XML formatting error is taking down our service! How come?
Error severities
As it turns out, XML casting and validation errors are raised with severity 16, and this severity will always put the transaction into an uncommittable state. For Service Broker, this is quite bad, as parsing and shreding XML is a day to day operation for most service applications. Unfortunately, there’s simply no workaround for this issue, which has the implication that one cannot use the SQL Server built-in XML methods to validate the received XML programmatically, one has to ensure that the XML is valid upfront. For XML payloads fortunately there is a solution: declare the message validation on the message type and even enforce an XML schema. This will cause validation to occur prior to the message being enqueued and will ensure that the activate dprocedure has to deal only with valid input. In our case, this means to change the contract of the service to use an XML well formatted validation message:
create message type [valid_payload] validation = well_formed_xml;
create contract [validated_contract] ([valid_payload] sent by initiator);
go
alter service [target_service] (add contract [validated_contract]);
We also need to change our activated procedure to deal with the new message type name, the ‘valid_payload’ type:
…
if (0 != @@rowcount)
begin
if (@messageType = N‘valid_payload’)
begin
select @payload = cast(@messageBody as xml);
insert into [target_table] (
[id], [key], [value])
…
We can now reset the broker in tempdb (to get rif of the erroneous message that would prevent the target_queue from ever becomming enabled) and enable back the target_queue:
alter database tempdb set new_broker with rollback immediate;
go
alter queue target_queue with status = on;
go
So now we can send again the erroneous message, this time using the validated contract and message type:
— send a message to the target service
declare @handle uniqueidentifier, @payload nvarchar(max);
begin transaction;
begin dialog conversation @handle
from service [sender_service]
to service ‘target_service’, ‘current database’
on contract [validated_contract]
with encryption = off;
select @payload = N‘<payload><entry id=”1″ key=”A” value=”B”></payload>’;
send on conversation @handle message type [valid_payload] (@payload);
commit;
go
And, as expected, the Service Broker responds immedeately with an error on our conversation because the message did not pass the XML payload validation:
<Error xmlns=“http://schemas.microsoft.com/SQL/ServiceBroker/Error“>
<Code>-9615</Code>
<Description>A message of type ‘valid_payload’ failed XML validation on the target service. XML parsing: line 1, character 51, end tag does not match start tag This occurred in the message with Conversation ID ‘…’, Initiator: 1, and Message sequence number: 0.</Description>
</Error>
So we were able to cope with the case of invalid XML, but how about other cases? What if the message is still valid XML, but has some invalid attributes? Ie. the payload ‘<payload><entry id=”foobar” key=”A” value=”B”/></payload>’will pass the well_formed_xml test, but will still cause the service to disable itslef, because ‘foobar’ cannot be casted to int. You could raise the bar even further by specifying an XML schema for validation, but not all of us are so confident in our XML schema knowledge to be sure that we covered everything. And besides, there are binary messages too, XML payload is not a requirement for Service Broker. The truth is that the severity of such trivial errors forcing the transaction to rollback is quite a burden on the application, since it means that pretty much any malformed message can disable a service. And there simply isn’t any buletproof defense against it.
Post Rollback corrective actions
Many developers I talked with have been at this point already and realised that they simply cannot cover all the angles on this problem so they tried another approach: what if one rolls back, but then does some corrective action. Surely we could write some code in the catch block that remedies the problem, like dequeue the message and error the conversation immedeately, without trying to process it’s payload. Usually this looks something like the following, int he BEGIN CATCH block:
— Test whether the transaction is uncommittable.
if (XACT_STATE()) = –1
begin
rollback transaction;
— take corrective actions
declare @error int, @message nvarchar(4000);
select @error = ERROR_NUMBER(), @message = ERROR_MESSAGE();
begin transaction;
receive top(1) @messageBody = message_body
from [target_queue]
where conversation_handle = @handle;
end conversation with error = @error description = @message;
commit;
end;
But this is riddled with problems. The moment the transaction was rollbed back, the code has absolutely no guarantee over the state of this conversation and its messages. Another instance of the activated procedure might had already picked up the very same message and tried to process it. This problem is true for any processing of this kind, it is not specific to Service Broker, but the typical Service Broker environment, with it’s activated procedures running in parallel and trying to grab the next unlocked message it is very likely to happen in production. What’s worst is that is is unlikely to happen in the test development environement, unless the test environment matches the concurrency of the production environment.
Conclusions
So it turns out error handling in Service Broker is a bit trickier than expected and presenting some challenges. Some recommendations stand valid:
- Do use the BEGIN TRY/BEGIN CATCH blocks, they are simplifying the code tremendously
- Use the Service Broker conversation semantics to represent business transactions, and use the END CONVERSATION … WITH ERROR to signal errors to the other service at the end of a conversation when you catch
- Do harden your service broker contract as much as possible using XML validation schemas to catch malformed messages as early as possible, before they enter your application queues
But other thant this, dealing with errors and the problems caused by uncommittable transactions and its effects on Service Broker queues is by no means trivial, and is a subject I actually still looking for an acceptable answer.
Hi Remus,
I can’t test it right now, but can a doomed transaction become “un-doomed” by rolling back to a savepoint that was taken before the error that caused the transaction to become uncommittable? If so, the solution is obvious – take a savepoint directly after reading a message from the queue, and rollback to that savepoint if an error turns out to be a poison message.
Best, Hugo
I have handled errors in a different way, similar to what Hugo suggested – can you please let me know if this is a valid way of doing it.
Basically I begin a transaction before receiving any messages off the queue. When I get a message off the queue i also store the conversation_handle. I then do processing on the message in a try..catch block. In the catch block, i rollback the transaction (which puts the message back on the queue), begin a new transaction, and then get the message off the queue again using the same conversation_handle. I then log the error in a table and send an error message back to the initiating queue.
So, I am using code something like this
begin transaction
— get 1 message from the queue
waitfor (receive top (1)
@conversationhandle = conversation_handle,
@messagetype = message_type_name,
@message = cast(message_body AS xml)
from target_queue), TIMEOUT 1000;
— keep looping while we have a message to process
while (@conversationhandle is not null)
BEGIN
begin try
— do processing on the message
end try
begin catch
— error handling
rollback;
begin transaction
waitfor (receive top (1)
@conversationhandle = conversation_handle,
@messagetype = message_type_name,
@message = cast(message_body AS xml)
from target_queue
where conversation_handle = @conversationhandle), TIMEOUT 1000;
— log some error somewhere
— and send an error message back to the initiating queue
goto exit_conversation
end catch
exit_conversation:
— End the conversation and commit
END CONVERSATION @conversationhandle;
commit;
select @conversationhandle = null;
BEGIN TRANSACTION;
— lets see if there are any other messages waiting to be processed and get that if there is
waitfor (receive top (1)
@conversationhandle = conversation_handle,
@messagetype = message_type_name,
@message = cast(message_body as xml)
from target_queue), TIMEOUT 1000;
end
commit;
Unfortunately rollback to savepoints are not supported on doomed transactions. The funny story is that if you have *both* Roger’s books about Service Broker (the ‘beta’ and the ‘2005’ versions) the ‘beta’ one shows how to use rollback to savepoint, because in Beta 2 this was working! By RTM time though the functionality had changed.
As for rolling back an then receiving the same message it works only if there’s one and only one queue reader. In a concurrent environment the rolled back message is up for grabs as soon as it is unlocked and a different instance of the procedure may grab it and start processing it ‘anew’, so the risk of deactivating the queue is still there.
This blog has saved me hours! Thanks muchly 🙂
[…] were few and far between and depended upon user input, this was not trivial. You can see this article for a tutorial on […]
Great articles! Thanks for your help.
I came across another site that talked about message poisoning. Was I misunderstanding them or should I end the conversation before/after the rollback in the code above, as well? Thanks.
@Chris: If you end *before*, it will be rolled back so is a no-op. If you end *after*, the conversation was already unlocked by the rollback and it can be already locked and changed by another thread. Getting poison message to be handled *automatically* is very tricky. My recommendation is to handle all cases possible in the commit path, and when faced with a true poison message that causes rollback, analyze it manually and fix the procedure to cause it to go on the commit path. Cycle, rinse repeat until all errors are properly handled.
Hi, Remus. I am having some issues with my deployment of service broker, and I am stumped. I am getting the error – “…The conversation handle “” is not found…”. The service broker send/receive occurs on the same instance within the same db. I am using an activation procedure from the queue. I am using a trigger on a table to start the messaging. Below is my code.
Trigger:
DECLARE @XML XML;
IF EXISTS(SELECT * FROM inserted)
BEGIN
BEGIN TRANSACTION
SELECT @XML = (SELECT ID FROM inserted FOR XML AUTO);
— Send the XML records to the Service Broker queue:
DECLARE @DialogHandle UNIQUEIDENTIFIER
BEGIN DIALOG CONVERSATION @DialogHandle
FROM SERVICE BrokerServiceName
TO SERVICE ‘BrokerServiceName’
ON CONTRACT BrokerContractName
WITH ENCRYPTION = OFF;
SEND ON CONVERSATION @DialogHandle
MESSAGE TYPE BrokerMessageType
(@XML);
COMMIT TRANSACTION
Activation SP:
DECLARE @XML XML,
@MessageBody VARBINARY(MAX),
@MessageTypeName SYSNAME,
@EmpID UNIQUEIDENTIFIER,
@ConversationID UNIQUEIDENTIFIER,
@ID UNIQUEIDENTIFIER;
DECLARE @Queue TABLE (
ConversationID UNIQUEIDENTIFIER,
MessageBody VARBINARY(MAX),
MessageTypeName SYSNAME);
BEGIN TRY
BEGIN TRANSACTION
WHILE (1 = 1)
BEGIN
WAITFOR (
RECEIVE conversation_handle, message_body, message_type_name
FROM BrokerQueue
INTO @Queue
), TIMEOUT 5000;
— If no messages exist, then break out of the loop:
IF NOT EXISTS(SELECT * FROM @Queue) BREAK;
DECLARE Cursor1 CURSOR FAST_FORWARD
FOR SELECT * FROM @Queue;
OPEN Cursor1;
FETCH NEXT FROM Cursor1
INTO @ConversationID, @MessageBody, @MessageTypeName;
WHILE @@FETCH_STATUS = 0
BEGIN
— Let’s only deal with messages of Message Type
IF @MessageTypeName = ‘BrokerMessageType’
BEGIN
SET @XML = CAST(@MessageBody AS XML);
SELECT @EmpID = tbl.rows.value(‘@ID’, ‘UNIQUEIDENTIFIER’)
FROM @XML.nodes(‘/inserted’) tbl(rows);
exec somestoredprocedure @EmpID;
END
END CONVERSATION @ConversationID
FETCH NEXT FROM Cursor1
INTO @ConversationID, @MessageBody, @MessageTypeName;
END
CLOSE Cursor1;
DEALLOCATE Cursor1;
— Purge the temporary in-proc table:
DELETE FROM @Queue;
END
COMMIT TRANSACTION
END TRY
BEGIN CATCH
IF (XACT_STATE()) = -1
BEGIN
ROLLBACK TRANSACTION
END
IF (XACT_STATE()) = 1
BEGIN
declare @error int, @message nvarchar(4000);
select @error = ERROR_NUMBER(), @message = ERROR_MESSAGE();
end conversation @ConversationID with error = @error description = @message;
commit transaction;
END
END CATCH
Can you see anything that would cause this error? Am I doing something wrong here? Any assistance/direction you can provide would be greatly appreciated. Thank you.