Troubleshooting Dialogs

December 20th, 2005

So you built your first Service Broker app, you’ve sent the first message and now you’re looking for the message on the target queue. Yet, the message is not there. What do you do? Where do you look first? Well, troubleshooting Service Broker is a bit different than troubleshooting your everyday database app.

So I’m trying to build here a short guide that you can follow to troubleshoot Service Broker issues.

 

First, you should ensure that the message was actually sent and committed.

Next, check if the message exists in the sys.transmission_queue. The transmission queue is similar to an Outgoing mailbox, an ‘Outbox’. Messages are kept there until the target acknowledges that it received the message, after which they are deleted. Therefore, if the message is still in the transmission queue it means that it was not yet acknowledged by the destination. How does one diagnose what is the problem? My recommendation is to follow the message flow: message is sent by sender, then accepted by the target, then an ack is sent by the target and finally this ack is accepted by the sender. I any of these steps have a problem, then the message will sit in the sys.transmission_queue.

Let’s now look at how to diagnose each step of the message flow. BTW, I often refer to the acknowledgement as ‘ack’, and to the transmission queue as ‘xmit queue’.

 

1. The sender cannot send the message, for whatever reason. If this is the case, the transmission_status column in sys.transmission_queue will contain an error message that will point at the problem. The appropriate action depends on the error being displayed.

Common problems include security problems (no database master key, no remote service binding, no certificates etc), classification problem (no route for the target service etc) or adjacent transport connection issues (connection handshake errors, unreachable target host etc)

 

2. The sender does send the message and the message reaches the target but the target does not accept the message. In this case, the sender’s transmission_status will be empty. To diagnose this issue, you must attach the Profiler to the target machine and enable the following events: ‘Broker/Broker:Conversation’, ‘Broker/Broker:Message Undeliverable’ and ‘Broker/Broker:Remote Message Acknowledgement’. When the message arrives, you will see the event ‘Broker/Broker:Remote Message Acknowledgement‘ with the EventSubClass Message with Acknowledgement Received followed by ‘Broker/Broker:Message Undeliverable‘ event. The TextData of this last event will contain an error message that will point at the problem.

Common problem in this case are security problems (you must turn on in addition the ‘Audit Security/Audit Broker Conversation‘ event in the Profiler to investigate these problems, the TextData should pinpoint to the failure cause), typos in service names or broker instance id, disabled target queues.

Note that in case this error in TextData says ‘This message could not be delivered because it is a duplicate.’ it means that the message is actually accepted by the target, but the acks don’t reach back to the sender and therefore the sender is retrying the message again and again (see below).

 

3. The sender does send the message, the message reaches the target and is accepted, but the target is unable to send back an ack. Same as above, you must attach the Profiler to the target machine and you will see repeated occurrences of the ‘Broker/Broker:Message Undeliverable event with the TextData ‘This message could not be delivered because it is a duplicate.‘. The vent will be generated each time the sender is retrying the message, which happens about once/minute (strictly speaking is after 4, 8, 16, 32 and then once every 64 seconds).

Typically the problem is a misconfigured route back from the target to the sender (the route for the initiator service is missing). The Profiler event ‘Broker:Message Classify‘ will show this, with an EventSubClass ‘3 – Delayed’ and a TextData message of ‘The target service name could not be found. Ensure that the service name is specified correctly and/or the routing information has been supplied.’.

Another possible cause is when the route configured on the target for the sender service has a typo. Since the ack is not stored in the sys.transmission_queue, you don’t have the handy transmission_status. Or do you? Actually, you can use the get_transmission_status function to get the transmission status of the ack! Lookup the conversation handle is sys.conversation_endpoints and then use this function to query the transmission status of the ack sent by that dialog back to the sender.

 

4. The sender does send the message, but the message never reaches the target. This can happen only if there are intermediate hops (SQL Server instances acting as forwarders). To determine which forwarder drops the messages, you have to connect the Profiler to each forwarder and see which one traces ‘Broker:Forwarded Message Dropped’ events. The most likely causes are either message timeout (the forwarders can’t get to send the message in time due to high load) or a misconfigured routing information on the forwarder (like missing routes in MSDB database, which is the one used for forwarding).

 

5. The sender does send the message, the message reaches the target and is accepted, the target is sending back the ack but the ack never reaches back the initiator (again, a forwarder is required for this to happen). Investigating this issue is identical with the issue above: attach the Profiler to each forwarder until you find the one that is dropping messages. Note that the forwarders from the sender to the target are not necessarily the same as the ones on the route from the target back to the sender!

5 responses to “Troubleshooting Dialogs”

  1. […] two years ago I have posted the original Troubleshooting dialogs post in this blog (back in the day when it was hosted at MSDN). I have often referenced this post […]

  2. Cuong La says:

    Hi,
    I have got an error comming up on the profiler with the message:
    “This message was dropped because it could not be dispatched on time. State: 2”. Can you please guide me as to how to fix this error.

    Thank you in advance.

    CL

  3. remus says:

    CL, see if the ssbdiagnose tool finds any problem: http://msdn.microsoft.com/en-us/library/bb934450.aspx
    If the tool reports no error you’re going to have to contact Microsoft product support about that error.

  4. se says:

    Hi Remus,
    I’ve got a server to server Service Broker implementation that has been running with no issues for several months now. Then suddenly we’re getting messages stuck on the sender’s sys.transmission_queue with a blank transmission_status. I followed your troubleshooting methods as described in #2 and #3 of this article but it doesn’t seem like a security or a misconfigured route issue since nothing has been changed there. What we did to clear up the sender’s transmission_queue was to have the route on the sender point to another target server that is similarly configured as the original target. The sender’s transmission_queue started to clear up then we pointed the sender back to the original target, then everything was back to normal. Would you know the cause of this?
    Thanks!

    se

  5. […] the notification message is delivered using Service Broker and all of my comments related to troubleshooting dialogs apply to this message delivery as well. If the notification message is no delivered, the first […]