Error Handling and Activation

August 13th, 2008

I have previously talked here about the queue monitors and the role they play in launching activated procedures. If you recall, I’ve mentioned that the Queue Monitors will enter the NOTIFIED state after they launch a stored procedure and not launch again the procedure until the Queue Monitor notices that RECEIVE statements are being issued against the queue. In an older post I have also talked about how difficult is to get error handling right, and in particular cast and convention errors. This may seem a trivial problem but in the Service Broker programs it is actually a serious problem because of the frequent conversation to and from XML. These two separate issues can actually ‘cooperate’ into a somehow surprising behavior. Namely if the activated procedure hits an error before it completes the RECEIVE statement, the Queue Monitor will stay in NOTIFIED state and won’t activate again the procedure. Although this looks similar to the typical poison message behavior when the queue is automatically disabled, this is a different issue. And unlike the poison message case, in the case when the Queue Monitor is stranded in NOTIFIED state you can not get a notification that the queue is no longer functional.

This post continues with an example showing how a relatively safe activated procedure can end up in this situation.

Lets consider what happens when a typical Service Broker activated procedure is presented with an invalid XML fragment in a message. First, lets create a pair of services:

create queue [initiator];
create service [http://rusanu.com/2008/08/13/initiator] on queue [initiator];
go

create queue [target];
create service [http://rusanu.com/2008/08/13/target]
	on queue [target]
	([DEFAULT]);
go

Lets say the target service is used to log user logins from workstations. We have a table that keeps these login events:

create table [user_log] (
	[id] int not null identity (1,1) primary key,
	[user_name] varchar(256),
	[workstation] varchar(256),
	[login_time] datetime not null);
go

The initiator service is sending XML messages that embed the user, workstation and time for a login event:

<login_info xmlns="//rusanu.com/2008/08/13" time="2008-08-13T08:00:00">
  <user>John Doe</user>
  <workstation>PCLAB36X</workstation>
</login_info>

So we create an activated procedure that is extracting the name, workstation and time from the XML and inserts the information into the [user_log] table:

create procedure usp_target
as
begin
	set nocount on;
	declare @h uniqueidentifier;
	declare @mt sysname;
	declare @msg xml;

	begin try
		begin transaction

		waitfor (receive top(1)
			@h = [conversation_handle],
			@mt = [message_type_name],
			@msg = cast(message_body as xml)
			from [target]), timeout 1000;

		if (@h is not null)
		begin
			if (@mt = N'DEFAULT')
			begin
				with xmlnamespaces (DEFAULT '//rusanu.com/2008/08/13')
				insert into [user_log] (
					[user_name],
					[workstation],
					[login_time])
				select T.c.value('user[1]', 'varchar(256)'),
					T.c.value('workstation[1]', 'varchar(256)'),
					T.c.value('@time', 'datetime')
					from @msg.nodes('/login_info') T(c);
			end

			end conversation @h;
		end
		commit;
	end try
	begin catch
		declare @error int, @message varchar(4000), @xstate int;
		select @error = ERROR_NUMBER(), @message = ERROR_MESSAGE(), @xstate = XACT_STATE();
		if @xstate != 0
			rollback;

		----------------------------
		-- Error logging goes here
		----------------------------
		print @message;

	end catch
end
go

alter queue [target] with activation ( status = on, procedure_name = [usp_target], execute as owner, max_queue_readers = 1); go

So we go ahead and test the procedure, by sending a message that will activate the procedure:

declare @h uniqueidentifier;
declare @msg nvarchar(max);
select @msg = N'<?xml version="1.0" encoding="utf-8"?>
<login_info xmlns="//rusanu.com/2008/08/13" time="2008-08-13T08:00:00.000">
  <user>John Doe</user>
  <workstation>PCLAB36X</workstation>
</login_info>';
begin dialog conversation @h
	from service [http://rusanu.com/2008/08/13/initiator]
	to service N'http://rusanu.com/2008/08/13/target'
	with encryption = off;
send on conversation @h (@msg);
go

And when we check the [user_log] table, nothing is there. How come? Well, we did send an invalid XML message. Because we assigned the XML to an NVARCHAR variable, it is stored as Unicode encoding. But our XML fragment declares itself as UTF-8 encoding, hence it is invalid and casting it to XML datatype will fail. This sort of XML errors are less obvious and I’ve seen in real life situations encoding mixed up like this.

If we check the server ERRORLOG or the system Event Log (eventvwr.exe) we’ll find the error message from the XML convertion:

The activated proc [dbo].[usp_target] running on queue test.dbo.target output the following: 'XML parsing: line 1, character 38, unable to switch the encoding'

This error happens during the RECEIVE statement. This interrupts the statement before the Queue Monitor is notified that RECEIVEs are occurring, and thus the activated procedure exists but the Queue Monitor is left in NOTIFIED state. We can check this in the DMV sys.dm_broker_queue_monitors. Although the queue is active, it refuses to activate again the procedure.

The workaround for this is failry easy: simply move the cast after the RECEIVE statement:

alter procedure usp_target
as
begin
	set nocount on;
	declare @h uniqueidentifier;
	declare @mt sysname;
	declare @msg xml;
	declare @msg_bin varbinary(max);

	begin try
		begin transaction

		waitfor (receive top(1)
			@h = [conversation_handle],
			@mt = [message_type_name],
			@msg_bin = [message_body]
			from [target]), timeout 1000;

		if (@h is not null)
		begin
			if (@mt = N'DEFAULT')
			begin
				select @msg = cast(@msg_bin as xml);
				with xmlnamespaces (DEFAULT '//rusanu.com/2008/08/13')
				insert into [user_log] (
					[user_name],
					[workstation],
					[login_time])
				select T.c.value('user[1]', 'varchar(256)'),
					T.c.value('workstation[1]', 'varchar(256)'),
					T.c.value('@time', 'datetime')
					from @msg.nodes('/login_info') T(c);
			end

			end conversation @h;
		end
		commit;
	end try
	begin catch
		declare @error int, @message varchar(4000), @xstate int;
		select @error = ERROR_NUMBER(), @message = ERROR_MESSAGE(), @xstate = XACT_STATE();
		if @xstate != 0
			rollback;

		----------------------------
		-- Error logging goes here
		----------------------------
		print @message;

	end catch
end
go

With this modified procedure the conversation error is causing the RECEIVE to rollback, but the Queue Monitor has already entered the RECEIVES_OCCURRING state and will activate the procedure again. Five consecutive rollbacks are triggering the poison message detection mechanism and the queue is disabled as expected. Why is this a better behavior than before? Because you can have monitoring set up around the BROKER_QUEUE_DISABLED event notification and intervene: remove the poison message and re-activate the queue.

XML message validation

In my example I intentionally used the DEFAULT message type, with no validation, and did not use a message type with well_formed_xml message validation. The later would had muted the whole point of the errors occurring during XML cast since and invalid message would not event be enqueued int the [target] queue. Indeed, using message type validation is the best option for such a case, but none the less, I have seen situations like the one I’ve described in production. It can happen when the developers intentionally omit the XML message type validation in the message type declaration to gain a bit of performance, or because they consider themselves safe from an invalid XML ever being sent.

Conclusion

Always make sure your RECEIVE statement is being executed in your activated procedure. If you are doing any cast, is better to receive into intermediate types that do not require a cast and only perform the cast after the RECEIVE statement has finished. Otherwise you may end up with your service not activating but your notification infrastructure never detecting this problem.

Comments are closed.