Troubleshooting Dialogs

December 20th, 2005

So you built your first Service Broker app, you’ve sent the first message and now you’re looking for the message on the target queue. Yet, the message is not there. What do you do? Where do you look first? Well, troubleshooting Service Broker is a bit different than troubleshooting your everyday database app.

So I’m trying to build here a short guide that you can follow to troubleshoot Service Broker issues.

 

First, you should ensure that the message was actually sent and committed.

Next, check if the message exists in the sys.transmission_queue. The transmission queue is similar to an Outgoing mailbox, an ‘Outbox’. Messages are kept there until the target acknowledges that it received the message, after which they are deleted. Therefore, if the message is still in the transmission queue it means that it was not yet acknowledged by the destination. How does one diagnose what is the problem? My recommendation is to follow the message flow: message is sent by sender, then accepted by the target, then an ack is sent by the target and finally this ack is accepted by the sender. I any of these steps have a problem, then the message will sit in the sys.transmission_queue.

Let’s now look at how to diagnose each step of the message flow. BTW, I often refer to the acknowledgement as ‘ack’, and to the transmission queue as ‘xmit queue’.

 

1. The sender cannot send the message, for whatever reason. If this is the case, the transmission_status column in sys.transmission_queue will contain an error message that will point at the problem. The appropriate action depends on the error being displayed.

Common problems include security problems (no database master key, no remote service binding, no certificates etc), classification problem (no route for the target service etc) or adjacent transport connection issues (connection handshake errors, unreachable target host etc)

 

2. The sender does send the message and the message reaches the target but the target does not accept the message. In this case, the sender’s transmission_status will be empty. To diagnose this issue, you must attach the Profiler to the target machine and enable the following events: ‘Broker/Broker:Conversation’, ‘Broker/Broker:Message Undeliverable’ and ‘Broker/Broker:Remote Message Acknowledgement’. When the message arrives, you will see the event ‘Broker/Broker:Remote Message Acknowledgement‘ with the EventSubClass Message with Acknowledgement Received followed by ‘Broker/Broker:Message Undeliverable‘ event. The TextData of this last event will contain an error message that will point at the problem.

Common problem in this case are security problems (you must turn on in addition the ‘Audit Security/Audit Broker Conversation‘ event in the Profiler to investigate these problems, the TextData should pinpoint to the failure cause), typos in service names or broker instance id, disabled target queues.

Note that in case this error in TextData says ‘This message could not be delivered because it is a duplicate.’ it means that the message is actually accepted by the target, but the acks don’t reach back to the sender and therefore the sender is retrying the message again and again (see below).

 

3. The sender does send the message, the message reaches the target and is accepted, but the target is unable to send back an ack. Same as above, you must attach the Profiler to the target machine and you will see repeated occurrences of the ‘Broker/Broker:Message Undeliverable event with the TextData ‘This message could not be delivered because it is a duplicate.‘. The vent will be generated each time the sender is retrying the message, which happens about once/minute (strictly speaking is after 4, 8, 16, 32 and then once every 64 seconds).

Typically the problem is a misconfigured route back from the target to the sender (the route for the initiator service is missing). The Profiler event ‘Broker:Message Classify‘ will show this, with an EventSubClass ‘3 – Delayed’ and a TextData message of ‘The target service name could not be found. Ensure that the service name is specified correctly and/or the routing information has been supplied.’.

Another possible cause is when the route configured on the target for the sender service has a typo. Since the ack is not stored in the sys.transmission_queue, you don’t have the handy transmission_status. Or do you? Actually, you can use the get_transmission_status function to get the transmission status of the ack! Lookup the conversation handle is sys.conversation_endpoints and then use this function to query the transmission status of the ack sent by that dialog back to the sender.

 

4. The sender does send the message, but the message never reaches the target. This can happen only if there are intermediate hops (SQL Server instances acting as forwarders). To determine which forwarder drops the messages, you have to connect the Profiler to each forwarder and see which one traces ‘Broker:Forwarded Message Dropped’ events. The most likely causes are either message timeout (the forwarders can’t get to send the message in time due to high load) or a misconfigured routing information on the forwarder (like missing routes in MSDB database, which is the one used for forwarding).

 

5. The sender does send the message, the message reaches the target and is accepted, the target is sending back the ack but the ack never reaches back the initiator (again, a forwarder is required for this to happen). Investigating this issue is identical with the issue above: attach the Profiler to each forwarder until you find the one that is dropping messages. Note that the forwarders from the sender to the target are not necessarily the same as the ones on the route from the target back to the sender!