SqlDependency.OnChange callback timing

January 4th, 2008

I had reviewed the way SqlDependency works several times in this blog, but the implementation of this feature will probably continue to surprise me for a long time. After reading about numerous reports of ERRORLOG files getting filled with messages like:

The query notification dialog on conversation handle '{5925E62A-A3BA-DC11-9E8E-000C293EC5A4}.'closed due to the following error: '<?xml version="1.0"?><Error xmlns="http://schemas.microsoft.com/SQL/ServiceBroker/Error"><Code>-8470</Code><Description>Remote service has been dropped.</Description></Error>'.

I decided to take a closer look. The immediate cause of this error is quite clear, is it because the target service of a Query Notification subscription was dropped. I have covered the system error 8470 and others in my earlier post Resending messages. But of course, the real question is why is this happening? Is there some coding error on the part of the application developer and some way to avoid this problem? Is there some configuration issue?

Read the rest of this entry »

Troubleshooting dialogs, the sequel

November 28th, 2007

Almost two years ago I have posted the original Troubleshooting dialogs post in this blog (back in the day when it was hosted at MSDN). I have often referenced this post when I was asked to help investigate some Service Broker issue, and I have seen others referencing it. Although it lacked a lot of details it was a good starting point for anybody who was asking itself ‘The messages don’t get through, wonder what’s wrong…’

It is time for an updated entry on this topic because the bar was raised to a whole new level: the Service Broker Diagnostics Tool was shipped in the November CTP of SQL Server 2008!

Read the rest of this entry »

Prevent ERRORLOG growth

November 11th, 2007

A problem I’ve often seen with many SQL Server deployments is when ERRORLOG grows and grows and grows, eventually filling all the available disk space. Reasons for this growth can be various, but the surprising thing is that sometimes they are absolutely legit. Kevin Farlee mentions on this blog entry http://blogs.msdn.com/sqlserverstorageengine/archive/2007/10/30/when-is-too-much-success-a-bad-thing.aspx about a site that is doing backups every 12 seconds, and each backup is reporting a success entry in ERRORLOG. Other reasons I’ve seen in production that caused growth were related to SqlDependency like the ones I mentioned previously here: http://blogs.msdn.com/remusrusanu/archive/2007/10/12/when-it-rains-it-pours.aspx. Now one thing to remember is that irellevant to what is causing the growth, there actualy is a stored procedure that forces an errorlog cycle: sp_cycle_errorlog. This procedure closes the current ERRORLOG, renames it as ERRORLOG.1 and opens a new one, pretty much like a server restart. If ERRORLOG growth is an issue in your environment, add an Agent job to periodically cycle the current ERRORLOG, perhaps followed up by an archive step.

When it rains, it pours

November 10th, 2007

I’ve seen a number of customers reporting problems about ERRORLOG growing out of control (tens of GBs) because of error like following:

2007-10-12 11:18:32.44 spid25s The query notification dialog on conversation handle ‘{EC54573A-9978-DC11-961C-00188B111155}.’ closed due to the following error: ‘<?xml version=”1.0??><Error xmlns=”http://schemas.microsoft.com/SQL/ServiceBroker/Error”><Code>-8490</Code><Description>Cannot find the remote service ‘SqlQueryNotificationService-4869b411-fa1c-4d8a-ab37-5bf5762eb98b’ because it does not exist.</Description></Error>’.

2007-10-12 11:37:20.69 spid51s The activated proc [dbo].[ SqlQueryNotificationStoredProcedure-633a0c13-66e4-410e-8bd8-744146d2258d] running on queue tempdb.dbo. SqlQueryNotificationService-633a0c13-66e4-410e-8bd8-744146d2258d output the following: ‘Could not find stored procedure ‘dbo. SqlQueryNotificationStoredProcedure-633a0c13-66e4-410e-8bd8-744146d2258d‘

2007-10-12 10:59:39.32 spid51 Error: 28054, Severity: 11, State: 1.
2007-10-12 10:59:39.32 spid51 Service Broker needs to access the master key in the database ‘tempdb’. Error code:25. The master key has to exist and the service master key encryption is required.

All these messages are related one way or another to the ADO.Net component SqlDependency. I’ll present each one how it is caused and how to avoid id.

Read the rest of this entry »

Fire and Forget: Good for the military, but not for Service Broker conversations

April 6th, 2006

DECLARE @dh UNIQUEIDENTIFIER;

BEGIN DIALOG CONVERSATION  @dh

      FROM SERVICE [Initiator]

      TO SERVICE N‘Target’;

SEND ON CONVERSATION @dh;

END CONVERSATION @dh;

     

Does the code above look familiar? Well, it shouldn’t, because it contains a very nasty error. This is the dreaded fire-and-forget message exchange pattern. The initiator opens a conversation, sends a message and then ends the conversation. I often seen this pattern deployed in practice, and (just apparently!) it seems to works fine. The target receives the message sent, and it ends its own target endpoint of the conversation, and things look happy, happy, happy, joy, joy, joy. Then suddenly something changes, and messages seem to vanish in thin air. The initiator sends them for sure, but the target doesn’t get them. The sent message is not in sys.transmission_queue, is not in the target queue, there is no error in the initiator queue, and even more, the initiator endpoint disappears! What happened? Well, the target service has responded with an error to the message the initiator has sent. The error could happen for a multitude of reasons, most common being one of these:

         the target may had denied access to the initiator service

         the message XML validation has failed (the target may had added a schema validation to the message)

         the target service may no longer implement the contract used by the initiator

         the target service was just dropped (can happen only if the broker instance was specified in the BEGIN DIALOG)

Well, if the target responded with an error, then why isn’t the error enqueued in the initiator’s queue? Because the initiator has declared that it is no interested in seeing that error! It had ended the conversation! So when the error comes back from the target, the error message is dropped and the initiator endpoint is deleted. This is the expected behavior when the conversation endpoint is in DISCONNECTED_OUTBOUND state.

 

The solution is let the target end the conversation first.  The initiator can simply send the message to the target and then continue. The target will receive the message, end the conversation from the target side and this will send an EndDialog message back to the initiator. All we have to do is attach an activated stored procedure to the initiator’s queue, and in this stored procedure we can end the initiator side of the conversation:

 

CREATE PROCEDURE inititator_queue_activated_procedure

AS

DECLARE @dh UNIQUEIDENTIFIER;

DECLARE @message_type SYSNAME;

DECLARE @message_body NVARCHAR(4000);

BEGIN TRANSACTION;

WAITFOR (

      RECEIVE @dh = [conversation_handle],

            @message_type = [message_type_name],

            @message_body = CAST([message_body] AS NVARCHAR(4000))

      FROM [InitiatorQueue]), TIMEOUT 1000;

WHILE @dh IS NOT NULL

BEGIN

      IF @message_type = N‘http://schemas.microsoft.com/SQL/ServiceBroker/Error’

      BEGIN

            RAISERROR (N‘Received error %s from service [Target]’, 10, 1, @message_body) WITH LOG;

      END

      END CONVERSATION @dh;

      COMMIT;

      SELECT @dh = NULL;

      BEGIN TRANSACTION;

      WAITFOR (

            RECEIVE @dh = [conversation_handle],

                  @message_type = [message_type_name],

                  @message_body = CAST([message_body] AS NVARCHAR(4000))

            FROM [InitiatorQueue]), TIMEOUT 1000;

END

COMMIT;

GO

 

ALTER QUEUE [InitiatorQueue]

      WITH ACTIVATION (

            STATUS = ON,

            MAX_QUEUE_READERS = 1,

            PROCEDURE_NAME = [inititator_queue_activated_procedure],

            EXECUTE AS OWNER);

GO

 

Now when en error is returned by the target service it is logged into the System Event Log:

 

Event Type:   Information

Event Source: MSSQLSERVER

Event Category:      (2)

Event ID:     17061

Date:         4/6/2006

Time:         10:01:41 PM

User:         REDMOND\remusr

Computer:     REMUSR10

Description:

Error: 50000 Severity: 10 State: 1 Received error ?<?xml version=”1.0″?><Error xmlns=”http://schemas.microsoft.com/SQL/ServiceBroker/Error”><Code>-8425</Code><Description>The service contract &apos;DEFAULT&apos; is not found.</Description></Error> from service [Target]

 

We could have chosen to insert the error message into an audit table, or even send it to an audit service (using Service Broker, of course.J).

How long should I expect ALTER DATABASE … SET ENABLE_BROKER to run?

January 30th, 2006

I’ve seen this question comming back again and again on forums and discussion groups. The ALTER DATABASE … SET ENABLE_BROKER seems to run forever and doesn’t complete. How long should one wait?

This statement completes imeadetly, but the problem is that is requires exclusive access to the database! Any connection that is using this database has a shared lock on it, even when idle, thus blocking the ALTER DATABASE from completing.

There is an easy trick for fix the problem: use the termination options of ALTER DATABASE:        

ROLLBACK AFTER integer [ SECONDS ]         | ROLLBACK IMMEDIATE         | NO_WAIT

-ROLLBACK will close all existing sessions, rolling back any pending transaction.

– NO_WAIT will terminate the ALTER DATABASE statement with error if other connections are blocking it.

ALTER DATABASE [<dbname>] SET ENABLE_BROKER WITH ROLLBACK IMMEDIATE;

Why does feature … not work under activation?

January 12th, 2006

A number of features behave differently when running under activated stored proc. Selecting from server level views, selecting from dynamic management views, using linked servers and other. Executing the same procedure from a user session yields different results. E.g. if the procedures looks up a dynamic management view like sys.dm_…, it gets fewer rows when running under activation than when running from a user session.

What happens is that you overlook the fact that activation is executing under an EXECUTE AS context. The BOL have a great article explaining the behavior of EXECUTE AS, ‘Extending Database Impersonation by Using EXECUTE AS’, http://msdn2.microsoft.com/en-us/library/ms188304.aspx. It is not activation that causes the different behavior, but the fact that activation ALWAYS uses an EXECUTE AS context. The ‘Troubleshooting Activated Stored Procedures’ topic in BOL (http://msdn2.microsoft.com/en-us/library/ms166102(en-US,SQL.90).aspx) actually explains this and recommends that you also use an EXECUTE AS in the user session when invoking the procedure for debugging purposes, to get the same behavior and results as when activated.

A short explanation of the problem is that the activation execution context is trusted only in the database, not in the whole server. Anything related to the whole server, like a server level view or a dynamic management view or a linked server, acts as if you logged in as [Public].

The recommended way to solve the issue is to sign the procedure with a server level certificate that has the proper rights needed for the operation in question (see http://msdn2.microsoft.com/en-us/library/ms181700.aspx). Another possible way to fix the problem is to mark the database as TRUSTWORTHY, using ALTER DATABASE. But you better read the carefully the ‘Extending Database Impersonation by Using EXECUTE AS’ before doing this step and make sure you understand all the implications. You should do this only if you completely trust the dba of the database in question, as he becomes a de facto sysadmin when the database is marked as trustworthy.

Troubleshooting Dialogs

December 20th, 2005

So you built your first Service Broker app, you’ve sent the first message and now you’re looking for the message on the target queue. Yet, the message is not there. What do you do? Where do you look first? Well, troubleshooting Service Broker is a bit different than troubleshooting your everyday database app.

So I’m trying to build here a short guide that you can follow to troubleshoot Service Broker issues.

 

First, you should ensure that the message was actually sent and committed.

Next, check if the message exists in the sys.transmission_queue. The transmission queue is similar to an Outgoing mailbox, an ‘Outbox’. Messages are kept there until the target acknowledges that it received the message, after which they are deleted. Therefore, if the message is still in the transmission queue it means that it was not yet acknowledged by the destination. How does one diagnose what is the problem? My recommendation is to follow the message flow: message is sent by sender, then accepted by the target, then an ack is sent by the target and finally this ack is accepted by the sender. I any of these steps have a problem, then the message will sit in the sys.transmission_queue.

Let’s now look at how to diagnose each step of the message flow. BTW, I often refer to the acknowledgement as ‘ack’, and to the transmission queue as ‘xmit queue’.

 

1. The sender cannot send the message, for whatever reason. If this is the case, the transmission_status column in sys.transmission_queue will contain an error message that will point at the problem. The appropriate action depends on the error being displayed.

Common problems include security problems (no database master key, no remote service binding, no certificates etc), classification problem (no route for the target service etc) or adjacent transport connection issues (connection handshake errors, unreachable target host etc)

 

2. The sender does send the message and the message reaches the target but the target does not accept the message. In this case, the sender’s transmission_status will be empty. To diagnose this issue, you must attach the Profiler to the target machine and enable the following events: ‘Broker/Broker:Conversation’, ‘Broker/Broker:Message Undeliverable’ and ‘Broker/Broker:Remote Message Acknowledgement’. When the message arrives, you will see the event ‘Broker/Broker:Remote Message Acknowledgement‘ with the EventSubClass Message with Acknowledgement Received followed by ‘Broker/Broker:Message Undeliverable‘ event. The TextData of this last event will contain an error message that will point at the problem.

Common problem in this case are security problems (you must turn on in addition the ‘Audit Security/Audit Broker Conversation‘ event in the Profiler to investigate these problems, the TextData should pinpoint to the failure cause), typos in service names or broker instance id, disabled target queues.

Note that in case this error in TextData says ‘This message could not be delivered because it is a duplicate.’ it means that the message is actually accepted by the target, but the acks don’t reach back to the sender and therefore the sender is retrying the message again and again (see below).

 

3. The sender does send the message, the message reaches the target and is accepted, but the target is unable to send back an ack. Same as above, you must attach the Profiler to the target machine and you will see repeated occurrences of the ‘Broker/Broker:Message Undeliverable event with the TextData ‘This message could not be delivered because it is a duplicate.‘. The vent will be generated each time the sender is retrying the message, which happens about once/minute (strictly speaking is after 4, 8, 16, 32 and then once every 64 seconds).

Typically the problem is a misconfigured route back from the target to the sender (the route for the initiator service is missing). The Profiler event ‘Broker:Message Classify‘ will show this, with an EventSubClass ‘3 – Delayed’ and a TextData message of ‘The target service name could not be found. Ensure that the service name is specified correctly and/or the routing information has been supplied.’.

Another possible cause is when the route configured on the target for the sender service has a typo. Since the ack is not stored in the sys.transmission_queue, you don’t have the handy transmission_status. Or do you? Actually, you can use the get_transmission_status function to get the transmission status of the ack! Lookup the conversation handle is sys.conversation_endpoints and then use this function to query the transmission status of the ack sent by that dialog back to the sender.

 

4. The sender does send the message, but the message never reaches the target. This can happen only if there are intermediate hops (SQL Server instances acting as forwarders). To determine which forwarder drops the messages, you have to connect the Profiler to each forwarder and see which one traces ‘Broker:Forwarded Message Dropped’ events. The most likely causes are either message timeout (the forwarders can’t get to send the message in time due to high load) or a misconfigured routing information on the forwarder (like missing routes in MSDB database, which is the one used for forwarding).

 

5. The sender does send the message, the message reaches the target and is accepted, the target is sending back the ack but the ack never reaches back the initiator (again, a forwarder is required for this to happen). Investigating this issue is identical with the issue above: attach the Profiler to each forwarder until you find the one that is dropping messages. Note that the forwarders from the sender to the target are not necessarily the same as the ones on the route from the target back to the sender!