Registry bloat after SQL Server 2012 SP1 installation

February 15th, 2013

SQL Server 2012 installation has the potential to leave an msiexec.exe installer process running after the installation finishes, as described in Windows Installer starts repeatedly after you install SQL Server 2012 SP1:

After you install SQL Server 2012 SP1 on a computer, the Windows Installer (Msiexec.exe) process is repeatedly started to repair certain assemblies. Additionally, the following events are logged in the Application log:
EventId: 1004
Source: MsiInstaller
Description: Detection of product ‘{A7037EB2-F953-4B12-B843-195F4D988DA1}’, feature ‘SQL_Tools_Ans’, Component ‘{0CECE655-2A0F-4593-AF4B-EFC31D622982}’ failed. The resource”does not exist.

EventId: 1001
Source: MsiInstaller
Description: Detection of product ‘{A7037EB2-F953-4B12-B843-195F4D988DA1}’, feature ‘SQL_Tools_Ans’ failed during request for component ‘{6E985C15-8B6D-413D-B456-4F624D9C11C2}’

When this issue occurs, you experience high CPU usage.
Cause

This issue occurs because the SQL Server 2012 components reference mismatched assemblies. This behavior causes native image generation to fail repeatedly on certain assemblies. Therefore, a repair operation is initiated on the installer package.

But the this problem has a much sinister side effect: it causes growth of the HKLM\Software registry hive. Except for the System hive, all the other registry hives are still restricted in size to a max of 2GB, see Registry Storage Space:

Views of the registry files are mapped in paged pool memory…The maximum size of a registry hive is 2 GB, except for the system hive.

As the runaway msiexec process bloats the Software registry hive your system may approach the maximum hive size and, even before that maximum is reached, the large hive will consume more and more of the very precious kernel paged pool memory. The system may start to exhibit erratic behavior, complaining about low ‘resources’, closing connections and other symptoms. For an example of how this erratic bahior may manifest, read Why the registry size can cause problems with your SQL 2012 AlwaysOn/Failover Cluster setup.

If you installed SQL Server 2012 SP1 recently follow the steps in KB2793634 for a fix

The biggest problem with the registry bloat erratic behavior is that it occurs long after the SQL Server 2012 SP1 installation and is quite difficult, even for expert users, to trace back the causality of the server erratic behavior to the SP1 installation. If you are uncertain if this issue is affecting you, run dir %SystemRoot%\system32\config and check the size of the SOFTWARE file, which is the storage of this registry hive. If you are indeed dealing with a bloated registry hive, visit KB2498915: How to Compress “Bloated” Registry Hives:

1) Boot from a WinPE disk.
2) Open regedit while booted in WinPe, load the bloated hive under HLKM. (e.g. HKLM\Bloated)
3) Once the bloated hive has been loaded, export the loaded hive as a “Registry Hive” file with a unique name. (e.g. %windir%\system32\config\compressedhive)
a) You can use dir from a command line to verify the old and new sizes of the registry hives.
4) Unload the bloated hive from regedit. (If you get an error here, close the registry editor. Then reopen the registry editor and try again.)
5) Rename the hives so that you will boot with the compressed hive.
e.g.
c:\windows\system32\config\ren software software.old
c:\windows\system32\config\ren compressedhive software

Update: At the time of writing this the SQL Server SP1 download page recommends installing the KB2793634. Kind folk that have been in the trenches and had to deal with this problem have also give me more feedback about the symptoms and solutions. I’ve been told that if the registry is already bloated due to the msiexec issue then applying the KB2793634 will not be enough. The fix only prevents further auto-restarts of msiexec, but it does not compress the registry. The following registry keys *may* be already huge:

  • HKLM\Software\Wow6432Node\Microsoft\.NETFramework\v2.0.50.27\NGENService
  • HKLM\SOFTWARE\Microsoft\.NETFramework\v2.0.50727\NGENService
  • HKLM\Software\Wow6432Node\Microsoft\.NETFramework\v4.0.30319\NGENService
  • HKLM\SOFTWARE\Microsoft\.NETFramework\v4.0.30319\NGENService

Symptoms of registry hive bloat include, but are not limited to:

  • Cluster service is crashing and “The Group Policy Client service failed the logon. Insufficient system resources exist to complete the requested service” are logged into the event viewer log.
  • It is impossible to perform additional installations, or you can do this just shortly after rebooting the server.
  • Users with domain accounts are unable to logon on the machine. When they log in a temporary profile is loaded.
  • In Server Manager the Roles and Features are not listed anymore.
  • Many actions are prevented with the error message ” Insufficient system resources exist to complete the requested service”.

The recommended action is to open a support case and contact CSS. If you are brave, can’t afford CSS support contracts, and you are convinced that the problems you’re experiencing is due to the SQL Server 2012 SP1 registry bload, you may try the following at your own risk:

  1. Install the KB2793634 hot fix, as per the procedure described in the KB article.
  2. From cmd prompt run %SystemRoot%\Microsoft.Net\Framework\v4.0.30319\ngen.exe executequeueditems and %SystemRoot%\Microsoft.Net\Framework64\v4.0.30319\ngen.exe executequeueditems
  3. Compress the registry hive using a WinPE disk or from the Windows recovery mode (press F8 at the boot screen). Note that the Windows system on which you attempt to compress the hive must have the KB973817 installed: The Reg.exe utility does not compress a registry key when the utility saves a registry key to a hive file on a computer that is running Windows Server 2008, Windows Vista, Windows 7 or Windows Server 2008 R2.
  4. After the compression procedure is complete, from a cmd prompt run SCF /SCANNOW. See KB929833 for more details.

In the case when the registry hive bloat prevents even the installation of the SP1 hotfix you may attempt increasing the hive size to 4GB, see RegistrySizeLimit. Editing this particular registry key to a wrong value can result in your system blue-screening during boot due to BAD_SYSTEM_CONFIG_INFO kernel panic. Each boot. Use at your own risk. The correct value to put is 0xFFFFFFFF.

If you plan to upgrade an RTM SQL Server 2012 instance to SP1 apply the slipstream SP1+CU2

If you have an RTM instance of SQL Server 2012 and plan to upgrade it to SP1 it is recommended to create a slipstream SP1+CU2 instalation and apply this instead. See Product Updates in SQL Server 2012 Installation, in SQL Server 2012 the slipstream functionality has been renamed to Product Updates:

The Product Update feature replaces the Slipstream functionality that was available in SQL Server 2008 PCU1. Therefore the command-line parameters, /PCUSource and /CUSource, associated with Slipstream functionality should no longer be used. The parameters will continue to work, but may be removed in a future release of SQL Server Setup. The /UpdateSource parameter combines the functionality of the Slipstream parameters.

Frankly I think the term ‘slipstream’ was less ambiguous than ‘product update’ but then who am I to complain…

How to enable Selective XML indexes in SQL Server 2012 SP1

February 11th, 2013

SQL Server 2012 SP1 has shipped a great enhancement to XML: Selective XML Indexes. When properly used these indexes can speed up the searching of XML columns tremendously, at little disk/size cost:

The selective XML index feature lets you promote only certain paths from the XML documents to index. At index creation time, these paths are evaluated, and the nodes that they point to are shredded and stored inside a relational table in SQL Server. This feature uses an efficient mapping algorithm developed by Microsoft Research in collaboration with the SQL Server product team. This algorithm maps the XML nodes to a single relational table, and achieves exceptional performance while requiring only modest storage space.

However, at the time of this post, the documentation on MSDN omits a crucial requirement: the database has to be enabled to support these new indexes. Because is an on-disk format change shipped in a service pack release the engine cannot use the new selective XML indexes unless explicitly allowed. And the database version must change to a version that the RTM SQL Server 2012 does not recognize so that the database is not accidentally attached/restored on an RTM instance of SQL Sever 2012 that would not comprehend the new Selective XML Index objects and would panic (undefined behavior). To enable Selective XML Indexes in the database you must run sp_db_selective_xml_index:

Enables and disables Selective XML Index functionality on a SQL Server database. If called without any parameters, the stored procedure returns 1 if the Selective XML Index is enabled on a particular database.

EXECUTE sys.sp_db_selective_xml_index
@db_name = N'AdventureWorks2012'
, @action = N'true';
GO

Be aware that once upgraded to support Selective XML indexes this database can no longer be attached or restored on an RTM instance of SQL Server 2012. This applies to log shipping, Database Mirroring and AlwaysOn relationships, which will break when this upgrade is performed. If all the partners in the log shipping, DBM session or AG are also upgraded to SQL Server 2012 SP1 you can re-enable the relationship after you enabled Selective XML Indexes.

SQL Server backup to URL

January 25th, 2013

With the SQL Server 2012 SP1 CU2 release a new important feature was added: ability to back up and restore a database straight from Azure Blob storage:

This feature released in SQL Server 2012 SP1 CU2, enables SQL Server backup and restore directly to the Windows Azure Blob service. This feature can be used to backup SQL Server databases on an on-premises instance or an instance of SQL Server running a hosted environment such as Windows Azure Virtual Machine. Backup to cloud offers benefits such as availability, limitless geo-replicated off-site storage, and ease of migration of data to and from the cloud.

The syntax for the new feature is straight forward. You must first create a credential for accessing your Azure Blob storage:

SQL Server requires Windows Azure account name and access key authentication to be stored in a SQL Server Credential. This information is used to authenticate to the Windows Azure account when it performs backup or restore operations.

CREATE CREDENTIAL mycredential WITH IDENTITY = 'mystorageaccount'
       ,SECRET = '' ;

BACKUP DATABASE AdventureWorks2012
      TO URL = 'https://mystorageaccount.blob.core.windows.net/mycontainer/db.bak'
      WITH CREDENTIAL = 'mycredential'
     ,STATS = 5;

RESTORE DATABASE AdventureWorks2012
     FROM URL = 'https://mystorageaccount.blob.core.windows.net/mycontainer/db.bak'
     WITH CREDENTIAL = 'mycredential'

Full backups, differential backups, log backups, filegroup backups, compressed backups are all supported. The only notable restrictions is that is not allowed to backup to two locations simultaneously. Here is the complete list of limitations:

  • The maximum backup size supported is 1 TB.
  • Backup to or restoring from the Windows Azure Blob storage service by using SQL Server Management Studio Backup or Restore wizard is not currently enabled
  • Creating a logical device name is not supported. So adding URL as a backup device using sp_dumpdevice or through SQL Server Management Studio is not supported
  • Appending to existing backup blobs is not supported. Backups to an existing Blob can only be overwritten by using the WITH FORMAT option
  • Backup to multiple blobs in a single backup operation is not supported
  • Specifying a block size with BACKUP is not supported.
  • Specifying a block size for restores might be required in certain scenarios
  • Specifying MAXTRANSFERSIZE is not supported.
  • Specifying backupset options – RETAINDAYS and EXPIREDATE are not supported.
  • SQL Server has a maximum limit of 259 characters for a backup device name. The BACKUP TO URL consumes 36 characters for the required elements used to specify the URL – ‘https://.blob.core.windows.net//.bak’, leaving 223 characters for account, container, and blob names put together.

I recommend going over Backup and Restore to Azure Blob Best Practices.

Inside the SQL Server 2012 Columnstore Indexes

May 29th, 2012

Columnar storage has established itself as the de-facto option for Business Intelligence (BI) storage. The traditional row-oriented storage of RDBMS was designed for fast single-row oriented OLTP workloads and it has problems handling the large volume range oriented analytical processing that characterizes BI workloads. But what is columnar storage and, more specifically, how does SQL Server 2012 implement columnar storage with the new COLUMNSTORE indexes?

Read the rest of this entry »

Adding a nullable column can update the entire table

February 16th, 2012

In a previous article Online non-NULL with values column add in SQL Server 2012 I talked about how adding a non-null column with default values is now an online operation in SQL Server 2012 and I mentioned how the situation when the newly added column may increase the rowsize can result in the operation being performed offline:

In the case when the newly added column increases the maximum possible row size over the 8060 bytes limit the column cannot be added online.

In this article I want to show you how such a situation can arise and how it impacts even the case that prior to SQL Server 2012 was always online, namely adding a nullable column. Lets consider the following example:

create table T (
	c char(1) null,
	filler char(7998) null,
	vc1 varchar(50) null,
	vc2 varchar(50) null);
go	

insert into  T (c, vc1, vc2) values
	('A', NULL, NULL),
	('B', replicate ('X', 50), NULL),
	('C', NULL, replicate('Y', 50)),
	('D', replicate ('X', 50), replicate('Y', 50));
go

alter table T add some_int int null;
go

If you run the snippet above in SQL Server 2005, SQL Server 2008 or SQL Server 2008 R2 it will succeed, and the column some_int will be added online, as a metadata only operation. However, the resulting table has some interesting properties. Lets try to update the newly added some_int column:

update T set some_int = null
	where c= 'A';
update T set some_int = null
	where c= 'B';
update T set some_int = null
	where c= 'C';
update T set some_int = null
	where c= 'D';
go

(1 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)
Msg 511, Level 16, State 1, Line 7
Cannot create a row of size 8064 which is greater than the allowable maximum row size of 8060.
The statement has been terminated.

3 updates succeeded, the fourth failed. Well, I’ve been warned when I created the table, right? Warning: The table “T” has been created, but its maximum row size exceeds the allowed maximum of 8060 bytes. INSERT or UPDATE to this table will fail if the resulting row exceeds the size limit. But I’m updating a fixed length column and, furthermore, I’m updating it from NULL to NULL. Lets try something else:

select * into T2 from T;
go
Msg 511, Level 16, State 1, Line 1
Cannot create a row of size 8064 which is greater than the allowable maximum row size of 8060.
The statement has been terminated.

OK, how about this:

alter table T rebuild;
go
Msg 511, Level 16, State 1, Line 1
Cannot create a row of size 8064 which is greater than the allowable maximum row size of 8060.

The statement has been terminated.

So my previous sequence of actions led to a table that I cannot rebuild! Clearly, not a very desirable state.

SQL Server 2012 nullable ADD COLUMN

In SQL Server 2012 the situation above cannot happen. The adding of the column would be blocked:

create table T (
c char(1) null,
filler char(7998) null,
vc1 varchar(50) null,
vc2 varchar(50) null);
go

insert into T (c, vc1, vc2) values
('A', NULL, NULL),
('B', replicate ('X', 50), NULL),
('C', NULL, replicate('Y', 50)),
('D', replicate ('X', 50), replicate('Y', 50));
go

alter table T add some_int int null;
go

Msg 511, Level 16, State 1, Line 1
Cannot create a row of size 8064 which is greater than the allowable maximum row size of 8060.

The statement has been terminated.

So SQL Server 2012 is able to detect the problem upfront and prevent the table from reaching the state in which it cannot be rebuilt. But detecting this situation is not a straightforward matter of table metadata and column size, the situation arises only if the data in the table has this problem. After all the following sequence succeeds:

create table T (
	c char(1) null,
	filler char(7998) null,
	vc1 varchar(50) null,
	vc2 varchar(50) null);
go	

insert into  T (c, vc1, vc2) values
	('A', NULL, NULL),
	('B', replicate ('X', 50), NULL),
	('C', NULL, replicate('Y', 50));
go

alter table T add some_int int null;
go

Indeed it is only the record 'D' that has a problem because both variable columns vc1 and vc2 have values in the row that are not NULL (and of a certain size). The ALTER TABLE ... ADD COLUMN in this situation must validate each row to ensure that it fits in the page, including the newly added column size. Therefore every row gets updated and the newly added column value of NULL is populated in the row image, thus enforcing that every row fits in the table. If any row grows over the maximum size due to the newly added column then the ALTER fails and all the rows updated thus far are rolled back.

If adding a nullable column in SQL Server 2012 has the potential of increasing the row size over the 8060 size then the ALTER performs an offline size-of-data update to every row of the table to ensure it fits in the page. This behavior is new in SQL Server 2012.

This situation can arise when adding a new non-sparse fixed length column or a variable non-nullable column with default value to a table that already has the potential of creating rows over the maximum size of 8060. This new behavior (update every row in order to validate they all fit after the column is added) does not occur when adding a nullable variable length column or a sparse fixed length column.

Online Index Operations for indexes containing LOB columns

August 5th, 2011

SQL Server supports online index and table rebuild operations which allow for maintenance operations to occur w/o significant downtime. While a table is being rebuild, or a new index is being built on it, the table is fully utilizable. It can be queried and any updates done to the table while the online rebuild operation is occurring will be contained in the final rebuilt table. A detailed explanation on how these online rebuild operations work can be found in the Online Indexing Operations in SQL Server 2005 white paper. But Online Index Build operations in SQL Server 2005, 2008 and 2008 R2 do not support tables that contain LOB columns, attempting to do so would trigger an error:

Msg 2725, Level 16, State 2, Line …
An online operation cannot be performed for index ‘…’ because the index contains column ‘…’ of data type text, ntext, image, varchar(max), nvarchar(max), varbinary(max), xml, or large CLR type. For a non-clustered index, the column could be an include column of the index. For a clustered index, the column could be any column of the table. If DROP_EXISTING is used, the column could be part of a new or old index. The operation must be performed offline.

To be accurate the restriction applies not to tables, but to any index or heap that contains an LOB column. That, of course, includes any clustered index or the base heap of a table if the table contains any LOB columns, but it would include any non-clustered index that includes a LOB column. In other words I can rebuild online non-clustered indexes of any table as long as they don’t use the INCLUDE clause to add a LOB column from the base table, but for sure I cannot rebuild online the table itself (meaning the clustered index or the heap). Nor can I add a clustered index to a base heap online, if the table contains LOB columns.

The question whether one should just use VARCHAR(MAX) and stop worrying about the chosen field size has came up on StackOverflow several times (varchar(max) everywhere?) and I always pointed out that there are at least some limitations (impossibility to do do online maintenance rebuild operations, impossibility to index such fields) and also all MAX types have a slight performance overhead, see Performance comparison of varchar(max) vs. varchar(N)).

Online Index Build, now with LOBs

Starting with SQL Server 11 it is actually permitted to build (and rebuild) online indexes and heaps containing LOB columns. The old legacy types (text, ntext and image) are not supported, not surprising considering that these types are on the deprecation path.

To understand why the original online rebuild operations from previous versions did not support LOB columns we need to consider the SQL Server Table and Index Organization. All indexes and tables consist of three allocation units: one for row data, one for overflow row data and one for LOB data. We can see this if we inspect the sys.system_internals_allocation_units system catalog view:


create table test (id int not null identity(1,1),
	somevar1 varchar(6000),
	somevar2 varchar(6000),
	someblob varchar(max))
go

insert into test (somevar1, somevar2, someblob) values ('A', 'B', 'C');
insert into test (somevar1, somevar2, someblob) values
         (replicate('A', 6000), replicate('B', 6000), replicate('C', 8000))
go

select au.*
from sys.system_internals_allocation_units au
join sys.system_internals_partitions p on au.container_id = p.partition_id
where p.object_id = object_id('test');
go

Our test table shows three allocation units. Now lets rebuild our table and look again at our allocation units:


alter table test rebuild;
go

select au.*
from sys.system_internals_allocation_units au
join sys.system_internals_partitions p on au.container_id = p.partition_id
where p.object_id = object_id('test');
go

We can see that our DATa and SLOB (aka. row overflow) allocation units have changed because they were rebuilt (they have different IDs and start at different pages). But the important thing is that the BLOB allocation unit has not changed. After the offline table rebuild, it has the same ID and starts at the same pages. This is because table and index rebuild operations do not rebuild the LOB data. They rebuild the row data and the row-overflow data, but the newly built rows will simply point back to the same old LOB data. The idea is that tables with LOB columns have large LOB values and rebuilding the LOB data would be prohibitive, with little or no benefit.

Offline operations can avoid rebuilding the LOB data without problems, but for online index and table rebuilds this poses an issue: for the duration of the online rebuild operation both the old rowset (the old index/table) and the new rowset would point to the same LOB data while updates are being made to rows.

In SQL Server 11 this problem was solved and now online operations can rebuild indexes and tables with LOB columns while keeping the data in the LOB allocation unit in a consistent state. SQL Server will internally track how LOB data is referenced by both the old index and the new index being built and will take appropriate actions to manage the sharing of the LOB data.

Limitations

The following restrictions and limitations apply only for the duration of the Online Index Rebuild operation:

Partial LOB .WRITE updates are transformed into full updates.
LOB data supports a highly efficient update mode, the .WRITE syntax. This is critical in creating streaming semantics, see Download and Upload images from SQL Server via ASP.Net MVC. When the .WRITE syntax is used on a LOB column belonging to an index that is being rebuilt online the generated plan will silently change it into a full value update, which generates significantly more log. If you rely heavily on this functionality be aware and schedule your online rebuilds accordingly.
DBCC CHECK operations will skip the consistency check of LOB allocation units belonging to indexes that are in the process of being rebuilt online.
During the online operation the LOB allocation unit is shared between the old index and the new index and is consistent if you consider both owners, however it may look inconsistent if considered from either one of the owner point of view.
File SHRINK operation will skip pages belonging to LOB allocation units belonging to indexes that are in the process of being rebuilt online.
If LOB data is shrunk the pointers in the ROW data referencing the LOB data that had moved have to be updated. While an online index rebuild occurs there could be two sets of pointers referencing the same LOB data, one in the old rowset and one in the new rowset.

How to Multicast messages with SQL Server Service Broker

July 20th, 2011

Starting with SQL Server 11 the the SEND verb has a new syntax and accepts multiple dialogs handles to send on:

SEND
   ON CONVERSATION [(]conversation_handle [,.. @conversation_handle_n][)]
   [ MESSAGE TYPE message_type_name ]
   [ ( message_body_expression ) ]
[ ; ]

With this syntax enhancement you can send a message to multiple destinations. This is not different from sending the same message multiple times. From the application point of view issuing one single SEND on 10 dialog handles is exactly the same as issuing 10 SEND statements on one dialog handle at a time. The improvement is in the sys.transmission_queue: issuing SEND multiple time would create multiple copies of the message body to be sent. By contrast the one single SEND on multiple handles will only store the message body once. We can see this if we look at the definition of sys.tranmission_queue in SQL Server 11:

sp_helptext 'sys.transmission_queue'

CREATE VIEW sys.transmission_queue AS
	SELECT conversation_handle = S.handle,
		to_service_name = Q.tosvc,
		to_broker_instance = Q.tobrkrinst,
		from_service_name = Q.fromsvc,
		service_contract_name = Q.svccontr,
		enqueue_time = Q.enqtime,
		message_sequence_number = Q.msgseqnum,
		message_type_name = Q.msgtype,
		is_conversation_error = sysconv(bit, Q.status & 2),
		is_end_of_dialog = sysconv(bit, Q.status & 4),
		message_body = ISNULL(Q.msgbody, B.msgbody),
		transmission_status = GET_TRANSMISSION_STATUS (S.handle),
		priority = R.priority
	FROM sys.sysxmitqueue Q
	LEFT JOIN sys.sysxmitbody B WITH (NOLOCK) ON Q.msgref = B.msgref
	INNER JOIN sys.sysdesend S WITH (NOLOCK)
             ON Q.dlgid = S.diagid AND Q.finitiator = S.initiator
	INNER JOIN sys.sysdercv R WITH (NOLOCK)
             ON Q.dlgid = R.diagid AND Q.finitiator = R.initiator
	WHERE is_member('db_owner') = 1

Compare this with the same view definition in SQL Server 2008 R2:

CREATE VIEW sys.transmission_queue AS
	SELECT conversation_handle = S.handle,
		to_service_name = Q.tosvc,
		to_broker_instance = Q.tobrkrinst,
		from_service_name = Q.fromsvc,
		service_contract_name = Q.svccontr,
		enqueue_time = Q.enqtime,
		message_sequence_number = Q.msgseqnum,
		message_type_name = Q.msgtype,
		is_conversation_error = sysconv(bit, Q.status & 2),
		is_end_of_dialog = sysconv(bit, Q.status & 4),
		message_body = Q.msgbody,
		transmission_status = GET_TRANSMISSION_STATUS (S.handle),
		priority = R.priority
	FROM sys.sysxmitqueue Q
	INNER JOIN sys.sysdesend S WITH (NOLOCK)
               ON Q.dlgid = S.diagid AND Q.finitiator = S.initiator
	INNER JOIN sys.sysdercv R WITH (NOLOCK)
               ON Q.dlgid = R.diagid AND Q.finitiator = R.initiator
	WHERE is_member('db_owner') = 1

You can see how in SQL Server 11 the message body was separated into a new system table (sys.sysxmitbody). Multicast SEND will create multiple entries in sys.sysxmitqueue (one for each dialog on which the message was multicasted) but only one entry in sys.sysxmitbody. Such a normalized storage scheme saves space consumed and, more importantly, amount of log generated during the SEND.

The Reversed Dialog pattern in publish-subscribe

The typical dialog pattern in pub-sub systems is for the subscriber to start the dialog and send an initial ‘subscribe’ message, then the subscription content is being delivered from the target (the publisher) to the initiator (the subscriber). I call this the Reverse Dialog Pattern because messages flow from the target to the initiator. Lets show with an example. We’ll create a publisher service that broadcast some important content to which services can subscribe to receive it. To spice it up, we’ll use a tag system to subscribe to optional content: all content is distributed with a list of associated tags, all subscribers specify the tag they’re interested in. Tag matching is done using the LIKE syntax, so that subscribers can specify '%' as a mean to subscribe to all content.

The Publisher Service

create message type subscription_request validation = none;
create message type subscription_content validation = well_formed_xml;

create contract distribution
	(subscription_request sent by initiator,
	subscription_content sent by target);

create queue publisher;
create service publisher on queue publisher (distribution);
go

create table subscriptions (
	subscription_id int not null identity(1,1),
	tag nvarchar(50) not null,
	conversation_handle uniqueidentifier not null,
	constraint pk_subscriptions primary key (subscription_id),
	constraint unq_conversation_handle unique (conversation_handle, tag));
go

create procedure usp_publisher_handler
as
begin
	declare @mt sysname, @dh uniqueidentifier, @mb varbinary(max);
	begin try
		begin transaction;
		receive top(1)
			@mt = message_type_name,
			@dh = conversation_handle,
			@mb = message_body
			from publisher;
		if (@mt = N'subscription_request')
		begin
			insert into subscriptions (conversation_handle, tag)
                                   values (@dh, cast(@mb as nvarchar(50)));
		end
		else if(@mt = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error'
			or @mt = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog')
		begin
			delete from subscriptions
				where conversation_handle  = @dh;
			end conversation @dh;
		end
		commit
	end try
	begin catch
		declare @xact_state int = xact_state();
		if @xact_state <> 0
		begin
			rollback;
		end
	end catch
end
go

alter queue publisher with activation (
	status = on,
	max_queue_readers = 1,
	procedure_name = usp_publisher_handler,
	execute as owner);
go

The publisher service is straight forward: it uses the subscriptions table to keep track of subscribers. The activated procedure associated with the publisher service processes the subscription_request messages and adds the request dialog to the subscriptions table. The request message body is the tag the subscriber is interested in.

The Publish Content procedure

create type publish_tags_type as table (
	tag nvarchar(50) not null primary key);
go

create procedure usp_publish_content
	@content xml,
	@tags publish_tags_type readonly
as
begin
	declare @sql nvarchar(max) = N'send on conversation (';
	declare @cnt int = 0;
	declare @dh uniqueidentifier;
	declare @comma nvarchar(2) = N'';

	declare crs cursor static read_only forward_only for
		select distinct conversation_handle
		from subscriptions s
		join @tags t on t.tag like s.tag;

	open crs;
	fetch next from crs into @dh;
	while 0 = @@fetch_status
	begin

		set @sql += @comma  + N'''' + cast(@dh as nvarchar(36)) + N'''';
		set @comma = N', ';
		set @cnt += 1;
		fetch next from crs into @dh;
	end
	close crs;
	deallocate crs;

	if @cnt > 0
	begin
		set @sql+= N') message type subscription_content (@content)';
		exec sp_executesql @sql, N'@content xml', @content;
	end
end
go

The publish content procedure takes a content to be distributed and the list of tags under which the content is distributed and sends the content to all interested subscribers. One single multicast SEND is used to reach all subscribers. Dynamic SQL is used to build the multicast SEND statement.

Adding subscribers

declare @i int = 0;
declare @sql nvarchar(max);
while @i < 10
begin
	set @sql = N'create queue subscriber_' + cast(@i as nvarchar(20)) + N';
		create service subscriber_' + cast(@i as nvarchar(20)) + N'
                            on queue subscriber_'+cast(@i as nvarchar(20)) + N';';
	exec sp_executesql @sql;
	set @sql = N'declare @dh uniqueidentifier;
		begin dialog @dh
			from service subscriber_' + cast(@i as nvarchar(20)) + N'
			to service N''publisher''
			on contract distribution
			with encryption = off;
		send on conversation @dh message type subscription_request
                    (''' +case @i%5 when 0 then N'%' else nchar(@i + 65) end + ''');';
	exec sp_executesql @sql;
	set @i += 1;
end
go

This snipped adds 10 subscribers interested in tags 'B', 'C', 'D' etc. The first and fifth subscribers are interested in everything ('%'). We can see the subscribers were added to the subscriptions table by the publisher activated procedure:

select * from subscriptions

subscription_id tag                               conversation_handle
--------------- ----------------- -------------------------------------
1               %                          AFC62EF2-35B3-E011-8EED-001C25160E57
2               B                          B3C62EF2-35B3-E011-8EED-001C25160E57
3               C                          B7C62EF2-35B3-E011-8EED-001C25160E57
4               D                          BBC62EF2-35B3-E011-8EED-001C25160E57
5               E                          BFC62EF2-35B3-E011-8EED-001C25160E57
6               %                          C3C62EF2-35B3-E011-8EED-001C25160E57
7               G                          C7C62EF2-35B3-E011-8EED-001C25160E57
8               H                          CBC62EF2-35B3-E011-8EED-001C25160E57
9               I                          CFC62EF2-35B3-E011-8EED-001C25160E57
10              J                          D3C62EF2-35B3-E011-8EED-001C25160E57

A test multicast message

declare @tags publish_tags_type;
insert into @tags (tag) values ('A'), ('B'), ('C');
exec usp_publish_content N'', @tags;
go

With this one call we notified all subscribers interested, with one single multicast SEND. We can check which of the subscribers got the content:

declare @i int = 0;
declare @sql nvarchar(max) = N'', @union nvarchar(20) = N'';
while @i < 10
begin
	set @sql += @union + N'select
              ''subscriber_' + cast(@i as nvarchar(20)) + N''' as subscriber,
               count(*) as count
               from subscriber_' + cast(@i as nvarchar(20));
	set @union = ' union all ';
	set @i += 1;
end
exec sp_executesql @sql;
go

subscriber   count
------------ -----------
subscriber_0 1
subscriber_1 1
subscriber_2 1
subscriber_3 0
subscriber_4 0
subscriber_5 1
subscriber_6 0
subscriber_7 0
subscriber_8 0
subscriber_9 0

(10 row(s) affected)

We can see that subscriber_2 and subscriber_3 each got a message since the tags they're interested are 'B' and 'C' which both match a tag set by the publisher. Subscribers 1 and 5 eahc got a message because they're interested in any tag.

This pattern of publish-subscribe is not new and similar applications could be built with SQL Server Service Broker in SQL Server 2005, 2008 and 2008R2. But with SQL Server 11 the distribution is more efficient and can scale and perform better as the message bodies are not inserted and deleted multiple times, once for each subscriber, in the publisher's transmission queue.

Online non-NULL with values column add in SQL Server 2012

July 13th, 2011

Prior to SQL Server 2012 when you add a new non-NULLable column with default values to an existing table a size-of data operation occurs: every row in the table is updated to add the default value of the new column. For small tables this is insignificant, but for large tables this can be so problematic as to completely prohibit the operation. But starting with SQL Server 2012 the operation is, in most cases, instantaneous: only the table metadata is changed, no rows are being updated.

Lets look at a simple example, we’ll create a table with some rows and then add a non-NULL column with default values. First create and populate the table:

create table test (
	id int not null identity(1,1) primary key,
	someValue int not null);
go

set nocount on;
insert into test (someValue) values (rand()*1000);
go 1000

We can inspect the physical structure of the table’s records using DBCC PAGE. First lets find the page that contains the first record of the table:

select %%physloc%%, * from test where id = 1;

In my case this returned 0xD900000001000000, which means slot 0 on page 0xD9 (aka. 217) of file 1, and my test database has the DB_ID 6. Hence the parameters to DBCC PAGE

dbcc traceon (3604,-1)
dbcc page(6,1,217,3)

Page @0x0000000170D5E000

m_pageId = (1:217)                  m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 84    m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594043432960
Metadata: PartitionId = 72057594039042048                                Metadata: IndexId = 1
Metadata: ObjectId = 245575913      m_prevPage = (0:0)                  m_nextPage = (1:220)
pminlen = 12                        m_slotCnt = 476                     m_freeCnt = 4
m_freeData = 7236                   m_reservedCnt = 0                   m_lsn = (30:71:25)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 2135435720             DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 15

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000000AEBA060

0000000000000000:   10000c00 01000000 34020000 020000†††††††††††††........4......

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1
Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 564

Note the last LSN that updated the page (30:71:25) and the size of the record in slot 0 (15 bytes). Now lets add a non-NULL column with default values:

alter table test add otherValue int not null default 42 with values;

We can select from the table and see that the table was changed and the rows have value 42 for the newly added column:

select top(2) * from test;

id          someValue   otherValue
----------- ----------- -----------
1           564         42
2           387         42

Yet if we inspect again the page, we can see that is unchanged:

dbcc traceon (3604,-1)
dbcc page(6,1,217,3)

Page @0x0000000170D5E000

m_pageId = (1:217)                  m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 84    m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594043432960
Metadata: PartitionId = 72057594039042048                                Metadata: IndexId = 1
Metadata: ObjectId = 245575913      m_prevPage = (0:0)                  m_nextPage = (1:220)
pminlen = 12                        m_slotCnt = 476                     m_freeCnt = 4
m_freeData = 7236                   m_reservedCnt = 0                   m_lsn = (30:71:25)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 2135435720             DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 15
Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000000E83A060
0000000000000000:   10000c00 01000000 34020000 020000†††††††††††††........4......

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1                              

Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 564                     

Slot 0 Column 3 Offset 0x0 Length 4 Length (physical) 0
otherValue = 42 

The page header is unchanged, the last LSN is still (30:71:25), proof that the page was not modified, and the physical record is unchanged and has the same size as before. Yet DBCC shows a Column 3 and its value 42! If you pay attention you’ll notice that the Column 3 though has an Offset 0×0 and a physical length of 0. Column 3 is somehow materialized out of thin air, as it does not physically exists in the record on this page. The ‘magic’ is that the table metadata has changed and it now contains a column with a ‘default’ value:

select pc.* from sys.system_internals_partitions p
	join sys.system_internals_partition_columns pc on p.partition_id = pc.partition_id
	where p.object_id = object_id('test');

Notice that sys.system_internals_partition_columns now has two new columns that are SQL Server 2012 specific: has_default and default_value. The column we added to the test table (the third row in the image above) has a default with value 42. This is how SQL Server 2012 knows how to show a value for Column 3 for this record, even though is physically missing on the page. With this ‘magic’ in place the ALTER TABLE will no longer have to update every row in the table and the operation is fast, metadata-only, no matter the number of rows in the table. This new behavior occurs automatically, no special syntax or setting is required, the engine will simply do the right thing. There is no penalty from having a missing value in a row. The ‘missing’ value can be queried, updated, indexed, exactly as if the update during ALTER TABLE really occurred. There is no measurable performance penalty from having a default value.

What happens when we update a row? The ‘default’ value is pushed into the row, even if the column was not modified. Consider this update:

update test set someValue = 565 where id = 1;

Although we did not touch the otherValue column, the row now was modified and it contains the materialized value:

dbcc page(6,1,217,3)

...
m_freeData = 7240                   m_reservedCnt = 0                   m_lsn = (31:271:2)
...
Slot 0 Offset 0x1c35 Length 19

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 19

Memory Dump @0x000000000AB8BC35

0000000000000000:   10001000 01000000 35020000 2a000000 030000††††........5...*......

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1                              

Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 565                     

Slot 0 Column 3 Offset 0xc Length 4 Length (physical) 4
otherValue = 42         

KeyHashValue = (8194443284a0)
Slot 1 Offset 0x60 Length 15
Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000000AB8A060
0000000000000000:   10000c00 02000000 83010000 020000†††††††††††††........ƒ......

Slot 1 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 2                              

Slot 1 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 387                     

Slot 1 Column 3 Offset 0x0 Length 4 Length (physical) 0
otherValue = 42

Notice how the physical record has increased in size (19 bytes vs. 15), the record has the value 42 in it (the hex 2a000000) and the Column 3 now has a real offset and physical size. So the update has trully materialized the default value in the row image. I intentionally copied the output of DBCC PAGE for the next slot in the page, to show that the record with id=2 was unaffected, it continues to have a smaller size of 15 bytes and Column 3 has no physical length.

Default value vs. Default constraint

Is worth saying that the new SQL Server 2012 default column value is not the same as the default value constraint. The default value is captured when the ALTER TABLE statement is run and can never change. Only rows existing in the table at the time of running ALTER TABLE statement will have missing ‘default’ values. By contrast the default constraint can be dropped or modified and new rows inserted after the ALTER TABLE will always have a value present in row for the new column. Any REBUILD operation on the table (or on the clustered index) will materialize all the missing values as the rows are being copied from the old hobt to the new hobt. The new hobt columns (sys.system_internals_partition_columns) will loose the has_default and default_value attributes, in effect loosing any trace that this column was added online. A default constraint by contrast will be preserved as a table is rebuilt.

Restrictions

Not all data types and default values can be added online. BLOB values like varchar(max), nvarchar(max), varbinary(max) and XML cannot be added online (and frankly I see no valid data model that has a non-NULL BLOB with a default…). Types that cannot be converted to sql_variant cannot be added online, like hierarchy_id, geometry and geography or user CLR based UDTs. Default expressions that require a different value for each row, like NEWID or NEWSEQUENTIALID cannot be added online (the default expression has to be a runtime constant, not to be confused with a deterministic expression, see Conor vs. Runtime Constant Functions for more details). In the case when the newly added column increases the maximum possible row size over the 8060 bytes limit the column cannot be added online. And is an Enterprise Edition only feature. For all the cases above the behavior will revert to adding the column ‘offline’, by updating every row in the table during the ALTER TABLE statement, creating a size-of-data update. When such a situation occurs a new XEvent is fired, which contains the reason why a size-of-data update occurred: alter_table_update_data.

How to update a table with a columnstore index

July 13th, 2011

In my previous article How to use columnstore indexes in SQL Server we’ve seen how to create a columnstore index on a table and how certain queries can significantly reduce the IO needed and thus increase in performance by leveraging this new feature. But once a columnstore index is added to a table the table becomes read-only as it cannot be updated. Trying to insert a new row in the table will result in an error:

insert into sales ([date],itemid, price, quantity) values ('20110713', 1,1.0,1);

Msg 35330, Level 15, State 1, Line 1

INSERT statement failed because data cannot be updated in a table with a columnstore index. Consider disabling the columnstore index before issuing the INSERT statement, then rebuilding the columnstore index after INSERT is complete.

The error message recommends a ‘workaround’, but rebuilding the columnstore index for updates may be prohibitively expensive. For the DW and BI scenarios that columnstore indexes are targeting there is a much better solution: use table partitioning. With SQL Server 11 the limit of maximum 1000 partitions per table has been increased to 15000 partitions and with this new limit one can configure the ETL process to update every day into a new partition and still retain many many years of data. The ETL process can upload the daily data into a staging table, create a columnstore index on the staging table, then use the fast ALTER TABLE … SWITCH operation to ‘switch in’ the new data. Using the very same example as in my previous article, lets create a staging table with identical structure as the sales facts table:

create table sales_staging (
	[id] int not null identity (1000000,1),
	[date] date not null,
	itemid smallint not null,
	price money not null,
	quantity numeric(18,4) not null,
	constraint check_date check ([date] = '20110716')) on [PRIMARY];
go

create unique clustered index cdx_sales_staging_date_id
   on sales_staging ([date], [id]) on [PRIMARY];
go

Note how the staging table has a constraint check that enforces the date to be the valid date for the next partition to be switched in. Now lets populate the staging table with some more dummy sales facts:

set nocount on
go

declare @i int = 0;
begin transaction;
while @i < 250000
begin
	insert into sales_staging ([date], itemid, price, quantity)
		values ('20110716', rand()*10000, rand()*100 + 100, rand()* 10.000+1);
	set @i += 1;
	if @i % 10000 = 0
	begin
		raiserror (N'Inserted %d', 0, 1, @i);
		commit;
		begin tran;
	end
end
commit;
go

Now that our fake ETL process has finished preparing the last days sales data into a staging table, lets add a columnstore index identical with the one on the real sales table:

create columnstore index cs_sales_price_staging
          on sales_staging ([date], itemid, price, quantity);
go

OK, our staging table is complete so lets switch it in into the 'big' sales table:

alter partition scheme ps next used [PRIMARY];
alter partition function pf() split range ('20110717');
go

alter table sales_staging switch to sales partition $PARTITION.PF('20110716');
go

That's it! We've just updated our sales table with the sales fact for the last day, despite the fact that it contained a columnstore index, without disabling the columnstore index. The increased partitions count supported in SQL Server 11 combined with the fact that aligned columnstore indexes are supported for the fast partition switch operations makes the tables with column store indexes updateable in practice, if the ETL process uses a staging table and the ETL schedule matches the partitioning scheme.

How to use columnstore indexes in SQL Server

July 13th, 2011

Column oriented storage is the data storage of choice for data warehouse and business analysis applications. Column oriented storage allows for a high data compression rate and as such it can increase processing speed primarily by reducing the IO needs. Now SQL Server allows for creating column oriented indexes (called COLUMNSTORE indexes) and thus brings the benefits of this highly efficient BI oriented indexes in the same engine that runs the OLTP workload. The syntax for creating columnstore indexes is described on MSDN at CREATE COLUMNSTORE INDEX. Lets walk trough a very simple example of how to create and use a columnstore index. First lets have a dummy sales table:

create partition function pf (date) as range left for values
  ('20110712', '20110713', '20110714', '20110715', '20110716');
go

create partition scheme ps as  partition pf all to ([PRIMARY]);
go

create table sales (
	[id] int not null identity (1,1),
	[date] date not null,
	itemid smallint not null,
	price money not null,
	quantity numeric(18,4) not null)
	on ps([date]);
go

create unique clustered index cdx_sales_date_id on sales ([date], [id]) on ps([date]);
go

Notice how I created this table on a partitioning scheme that has one partition a day. See my follow up article How to update a table with a columnstore index to understand why I choose this particular arrangement. For now, lets populate the table with 1 million ‘sales’ facts:

set nocount on;
go

declare @i int = 0;
begin transaction;
while @i < 1000000
begin
	declare @date date = dateadd(day, @i /250000.00, '20110712');
	insert into sales ([date], itemid, price, quantity)
		values (@date, rand()*10000, rand()*100 + 100, rand()* 10.000+1);
	set @i += 1;
	if @i % 10000 = 0
	begin
		raiserror (N'Inserted %d', 0, 1, @i);
		commit;
		begin tran;
	end
end
commit;
go

If we look now at the structure of the sales table we see that each partition has 250k rows spread along 1089 pages:

select * from sys.system_internals_partitions p
	where p.object_id = object_id('sales');

select au.* from sys.system_internals_allocation_units au
	join sys.system_internals_partitions p
		on p.partition_id = au.container_id
	where p.object_id = object_id('sales');
go

If we now run a BI type of query like get the number of sales facts and the total sales for a day, the query would have to scan an entire partition, generating 1089 logical reads:

set statistics io on;
select count(*), sum(price*quantity) from sales where date = '20110713'
set statistics io off;
go

Table 'sales'. Scan count 1, logical reads 1089, physical reads 0,...

So lets create a columnstore index on this table:

create columnstore index cs_sales_price on sales ([date], price, quantity) on ps([date]);
go

If we look at the structure of the columnstore index we'll see that it has a much smaller footprint, only 362 pages:

select * from sys.system_internals_partitions p
	where p.object_id = object_id('sales')
	and index_id = 2;

select au.* from sys.system_internals_allocation_units au
	join sys.system_internals_partitions p
		on p.partition_id = au.container_id
	where p.object_id = object_id('sales')
		and index_id = 2;
go

Note how the columnstore index has no pages allocated for the IN_ROW_DATA allocation unit, but instead has pages allocated to the LOB_DATA allocation unit. So a columnstore index has no rows, instead it uses the BLOB storage to store the column 'segments'. Due to compression possible with column oriented storage, it needs only about one third of the pages needed by the clustered index, although it contains the same columns and the same number of sales facts. If we run again the very same query as before, we'll see how it uses the columnstore index and generates less IO:

set statistics io on;
select count(*), sum(price*quantity) from sales where date = '20110713'
set statistics io off;
go

Table 'sales'. Scan count 1, logical reads 358, physical reads 0, read-ahead reads 0, ...

This article is just a very very simplified explanation of how column store indexes can be used. Column oriented storage is one of the major features that ships with the SQL Server 11 and there is much more we could talk about it, but I only wanted to give a short introduction. You should look into column oriented storage for BI and Data Warehousing projects, where a columnstore index could speed up significantly certain type of analytic queries, specially those that use aggregate functions.

On a final note you have to understand the restrictions that columnstore indexes have, these restrictions are described in detail at the MSDN Columnstore Indexes article. The most severe restriction, by far, is the fact that a table that has columnstore indexes cannot be updates, it becomes read-only. For the specific DW and BI scenarios that columnstore indexes addresses this is actually not such a hard restriction, as the ETL process can easily circumvent this problem by using staging tables and partitioning. More on this in a next article: How to update a table with a columnstore index.