How to change database mirroring encryption with minimal downtime

April 23rd, 2010

SQL Server Database Mirroring uses an encrypted connection to ship the data between the principal and the mirror. By default RC4 encryption is enabled and used. The endpoint can be configured to use AES encryption instead, or no encryption. The overhead of using RC4 encryption is quite small, the overhead of using AES encryption is slightly bigger, but not significant. Under well understood conditions, like inside a secured data center, encryption can be safely turned off for a 5-10% increase in speed in mirroring traffic. Note that even with encryption turned off, the traffic is still cryptographically signed (HMAC). Traffic signing cannot be turned off.

To change the encryption used by an endpoint, one has to run the ALTER ENDPOINT … FOR DATABASE_MIRRORING (ENCRYPTION = {DISABLED|SUPPORTED|REQUIRED}). Two endpoints must have compatible encryption settings to be able to communicate. The following table shows the compatibility matrix of encryption settings:

DISABLED SUPPORTED REQUIRED
DISABLED CLEAR CLEAR
SUPPORTED CLEAR ENCRYPTED ENCRYPTED
REQUIRED ENCRYPTED ENCRYPTED

The default setting for an endpoint is ENCRYPTION = REQUIRED, which enforces encryption and refuses to connect to an endpoint that has disabled encryption.

Changing encryption settings on an existing endpoint

If you have a running mirroring session and want to change the settings to squeeze the extra 5-10% you can expect from removing RC4 encryption, then chances are you deployed the endpoint with the default encryption settings, namely REQUIRED. If you don’t know the current endpoint settings you can always check the sys.database_mirroring_endpoints metadata catalog. When encryption is REQUIRED the encryption_algorithm column is one of 1,2,5 or 6. When encryption is SUPPORTED the encryption_algorithm column is one of 3,4,7 or 8. When is DISABLED the encryption_algorithm is 0 and the is_encryption_Enabled column changes to 0. To force the traffic to be unencrypted at least one endpoint has to have ENCRYPTION = DISABLED.

When you run the ALTER ENDPOINT statement and change the encryption settings the endpoint is going to be stopped and restarted during the ALTER statement. All existing connections will be disconnected. A database mirroring session may immediately re-connect and not react to this short disruption in any fashion visible to the user.

The safest way to change an existing mirroring session that uses encryption to no longer encrypt the traffic, when there is no witness, would be like this:

  1. Change the mirror endpoint to SUPPORTED
  2. Change the principal endpoint to DISABLED
  3. Change the mirror endpoint to DISABLED
  4. Verify that the connections are unencrypted, check encryption_algorithm column in the sys.dm_db_mirroring_connections DMV.

If a the mirroring session involves a witness, then it too must have the endpoint set to a compatible encryption setting:

  1. Change the witness endpoint to SUPPORTED
  2. Change the mirror endpoint to SUPPORTED
  3. Change the principal endpoint to DISABLED
  4. Change the witness endpoint to DISABLED
  5. Change the mirror endpoint to DISABLED
  6. Verify that the connections are unencrypted, check encryption_algorithm column in the sys.dm_db_mirroring_connections DMV.

Note that if automatic failover is enabled then at the moment the principal endpoint is changed, it is possible for automatic failover to occur, given that for a brief moment the mirror and the witness will have a quorum.

How to troubleshoot if something goes wrong

Attach Profiler to all instances involved in the mirroring session and open a trace that listens for the Audit Database Mirroring Login Event Class (on SQL 2005 use the Audit Broker Login Event Class event instead, it will trace the DBM sessions). If you did a mistake during the ALTER ENDPOINT changes and ended up with incompatible settings, there will be an event generated visible in Profiler. The event Text will contain an error message explaining why the endpoints cannot connect.

Using tables as Queues

March 26th, 2010

A very common question asked on all programming forums is how to implement queues based on database tables. This is not a trivial question actually. Implementing a queue backed by a table is notoriously difficult, error prone and susceptible to deadlocks. Because queues are usually needed as a link between various processing stages in a workflow they operate in highly concurrent environments where multiple processes enqueue rows into the table while multiple processes attempt to dequeue these rows. This concurrency creates correctness, scalability and performance challenges.

But since SQL Server 2005 introduced the OUTPUT clause, using tables as queues is no longer a hard problem. This fact is called out in the OUTPUT Clause topic in BOL:

You can use OUTPUT in applications that use tables as queues, or to hold intermediate result sets. That is, the application is constantly adding or removing rows from the table… Other semantics may also be implemented, such as using a table to implement a stack.

The reason why OUTPUT clause is critical is that it offers an atomic destructive read operation that allows us to remove the dequeued row and return it to the caller, in one single statement.

Read the rest of this entry »

Performance comparison of varchar(max) vs. varchar(N)

March 22nd, 2010

The question of comparing the MAX types (VARCHAR, NVARCHAR, VARBINARY) with their non-max counterparts is often asked, but the answer usually gravitate around the storage differences. But I’d like to address the point that these types have inherent, intrinsic performance differences that are not driven by different storage characteristics. In other words, simply comparing and manipulating variables and columns in T-SQL can yield different performance when VARCHAR(MAX) is used vs. VARCHAR(N).

Assignment

First comparing simple assignment, assign a value to a VARBINARY(8000) variable in a tight loop:

Read the rest of this entry »

Dealing with Large Queues

March 9th, 2010

On a project I’m currently involved with we have to handle a constant influx of audit messages for processing. The messages come from about 50 SQL Express instances located in data centers around the globe, delivered via Service Broker into a processing queue hosted on a mirrored database where an activated procedure shreds the message payload into relational tables. These tables are in turn replicated with transactional replication into a data warehouse database. after that the messages are deleted from the processing servers, as replication is set up not to replicate deletes. The system must handle a constant average rate of about 200 messages per second, 24×7, with spikes going up to 2000-3000 messages per second over periods of minutes to an hour.

When dealing with these relatively high volumes, it is inevitable that queues will grow during the 2000 msgs/sec spikes and drain back to empty when the incoming rate stabilizes again at the normal 200 msgs/sec rate. Service Broker does an excellent job at handling these non-connectivity periods, retains the audit messages and quickly delivers them when connectivity is restored.

What I noticed though is that sometimes the processing of the received messages could hit a threshold from where it could not recover. The queue processing would slow down to a rate that was bellow the incoming rate, and from that point forward the queue could just grow. I want to detail a bit the reason why this can happen and what I did to alleviate the problem.

Read the rest of this entry »

On SQL Server boolean operator short-circuit

September 13th, 2009

Recently I had several discussions all circling around the short-circuit of boolean expressions in Transact-SQL queries. Many developers that come from an imperative language background like C are relying on boolean short-circuit to occur when SQL queries are executed. Often they take this expectation to extreme and the correctness of the result is actually relying on the short-circuit to occur:

select 'Will not divide by zero!' where 1=or 1/0=0

In the SQL snippet above the expression on the right side of the OR operator would cause a division by zero if ever evaluated. Yet the query executes fine and the successful result is seen as proof that operator short-circuit does happen! Well, is that all there is? Of course not. An universal quantification cannot be demonstrated with an example. But it can be proven false with one single counter example!

Luckily I have two aces on my sleeve: for one I know how the Query Optimizer works. Second, I’ve stayed close enough to Microsoft CSS front lines for 6 months to see actual cases pouring in from developers bitten by the short-circuit assumption. Here is my counter-example case:


create table eav (     
	eav_id int identity(1,1) primary key,     
	attribute varchar(50) not null,     
	is_numeric bit not null,     
	[value] sql_variant null); 
	
create index eav_attribute on eav(attribute) include ([value]); 
go 

-- Fill the table with random values 
set nocount on 
declare @i int; 
select @i = 0; 
while @i < 100000 
begin
    declare @attribute varchar(50),
	          @is_numeric bit,
			  @value sql_variant;     
	select @attribute = 'A' + cast(cast(rand()*1000 as  int) as varchar(3));     
	select @is_numeric = case when rand() > 0.5 then 1 else 0 end;     
	if 1=@is_numeric         
		select @value = cast(rand() * 100 as int);     
	else         
		select @value = 'Lorem ipsum';     
	insert into eav (attribute, is_numeric, [value])     
		values (@attribute, @is_numeric, @value);     
	select @i = @i+1; 
end 
go 

-- insert a 'trap' 
insert into eav (attribute, is_numeric, [value]) values ('B1', 0, 'Gotch ya'); 
go
 
-- select the 'trap' value 
select [value] from eav 
	where      attribute = 'B1'      
	and is_numeric = 1      
	and cast([value] as int) > 50 
go 

Msg 245, Level 16, State 1, Line 3
Conversion failed when converting the varchar value 'Gotch ya' to data type int.

This happens on SQL Server 2005 SP2. Clearly, the conversion does occur even though the value is marked as ‘not numeric’. Whats going on here? To better understand, lets insert a known value that can be converted and then run the same query again and look at the execution plan:


insert into eav (attribute, is_numeric, [value]) values ('B2', 1, 65); 
go 

select [value] from eav 
	where      attribute = 'B2'      
	and is_numeric = 1      
	and cast([value] as int) > 50;
go  

boolean short-circuit counter example query plan

boolean short-circuit counter example query plan

Looking at the plan we can see how the query is actually evaluated: seek on the non-clustered index for the attribute ‘B2’, project the ‘value’, filter for the value predicate ‘cast([value] as int)>50’ then perform a nested join to look up the ‘is_boolean’ in the clustered index! So the right side of the AND operator is evaluated first. Q.E.D.

Is this a bug? Of course not. SQL is a declarative language, the query optimizer is free to choose any execution path that provide the requested result. Boolean operator short-circuit is NOT GUARANTEED. My query has set up a trap for the query optimizer, by providing a tempting execution path using the non-clustered index. For my example to work I had to set up a large table and enough distinct values of ‘attribute’ so that the optimizer would see the non-clustered index access followed by bookmark look up as a better plan than a clustered scan. And it is, by all means a better plan. But then I placed my trap: by adding the ‘value’ as an included column in the non-clustered index, I give the optimizer a too sweet to resists opportunity to evaluate the filter predicate on the ‘value’ column before it evaluates the filter predicate on the ‘is_numeric’ column, thus forcing the break on the short-circuit assumption.

Passing Parameters to a Background Procedure

August 18th, 2009

Code on GitHub: rusanu/async_tsql

I have posted previously an example how to invoke a procedure asynchronously using service Broker activation. Several readers have inquired how to extend this mechanism to add parameters to the background launched procedure.

Passing parameters to a single well know procedure is easy: the parameters are be added to the message body and the activated procedure looks them up in the received XML, passing them to the called procedure. But is significantly more complex to create a generic mechanism that can pass parameters to any procedure. The problem is the type system, because the parameters have unknown types and the activated procedure has to pass proper typed parameters to the invoked procedure.

A generic solution should accept a variety of parameter types and should deal with the peculiarities of Transact-SQL parameters passing, namely the named parameters capabilities. Also the invocation wrapper usp_AsyncExecInvoke should directly accept the parameters for the desired background procedure. After considering several alternatives, I settled on the following approach:

Read the rest of this entry »

Asynchronous procedure execution

August 5th, 2009

Code on GitHub: rusanu/async_tsql

Update: a version of this sample that accepts parameters is available in the post Passing Parameters to a Background Procedure

Recently an user on StackOverflow raised the question Execute a stored procedure from a windows form asynchronously and then disconnect?. This is a known problem, how to invoke a long running procedure on SQL Server without constraining the client to wait for the procedure execution to terminate. Most times I’ve seen this question raised in the context of web applications when waiting for a result means delaying the response to the client browser. On Web apps the time constraint is even more drastic, the developer often desires to launch the procedure and immediately return the page even when the execution lasts only few seconds. The application will retrieve the execution result later, usually via an Ajax call driven by the returned page script.

Read the rest of this entry »

Exception handling and nested transactions

June 11th, 2009

I wanted to use a template for writing procedures that behave as intuitively as possible in regard to nested transactions. My goals were:

  • The procedure template should wrap all the work done in the procedure in a transaction.
  • The procedures should be able to call each other and the calee should nest its transaction inside the outer caller transaction.
  • The procedure should only rollback its own work in case of exception, if possible.
  • The caller should be able to resume and continue even if the calee rolled back its work.

My solution is to use a either a transactions or a savepoint, depending on the value of @@TRANCOUNT at procedure start. The procedures start a new transaction if no transaction is pending. Otherwise they simply create a savepoint. On exit the procedure commits the transaction they started (if they started one), otherwise they simply exit. On exception, if the transaction is not doomed, the procedure either rolls back (if it started the transaction), or rolls back to the savepoint it created (if calee already provided a transaction).

Read the rest of this entry »

Read/Write deadlock

May 16th, 2009

How does a simple SELECT deadlock with an UPDATE? Surprisingly, they can deadlock even on well tuned systems that does not do spurious table scans. The answer is very simple: when the read and the write use two distinct access paths to reach the same key and they use them in reverse order. Lets consider a simple example: we have a table with a clustered index and an non-clustered index. The reader (T1) seeks a key in the non-clustered index and then needs to look up the clustered index to retrieve an additional column required by the SELECT projection list. The writer (T2) is updating the clustered index and then needs to perform an index maintenance operation on the non-clustered index. So T1 holds an S lock for the key K on the non-clustered index and wants an S lock on the same key K on the clustered index. T2 has an X lock on the key K on the clustered index and wants an X lock on same key K on the non-clustered index. Deadlock, T1 will be chosen as a victim. So you see, there are no complex queries involved, no suboptimal scan operations, no lock escalation nor page locks involved. Simple, correctly written queries may deadlock if doing read/write operations on the same key on a table with two indexes. Lets show this in an example:

Read the rest of this entry »

Version Control and your Database

May 15th, 2009

I am still amazed when I walk into a development shop and I ask for their application database script and they offer to extract one for me. Really, your only definition of the database is the database itself? Now you wouldn’t keep your libraries as object code only and reverse engineer them every time you want to make a change, would you?

Now, all sarcasm aside, why is so hard to keep a database definition as source and keep it under version control? The reason is not that people are dumb, these are bright developers and they would do the right thing if it would fit into their natural work flow. The problem is that the tool set at their disposal as developers (usually the Visual Studio suite) is far far behind the capabilities of the database administration tool set (the SSMS). But the later is focused for the needs of administrators and the natural flow of actions is to visually modify some schema properties (add tables, define indexes etc) in a dialog and then click the ‘Do it!’ button. This hides actual scripts going on behind the scenes and does not lend itself naturally to the normal code/build/run/test/commit cycle of the developer desk.

Read the rest of this entry »