Registry bloat after SQL Server 2012 SP1 installation

February 15th, 2013

SQL Server 2012 installation has the potential to leave an msiexec.exe installer process running after the installation finishes, as described in Windows Installer starts repeatedly after you install SQL Server 2012 SP1:

After you install SQL Server 2012 SP1 on a computer, the Windows Installer (Msiexec.exe) process is repeatedly started to repair certain assemblies. Additionally, the following events are logged in the Application log:
EventId: 1004
Source: MsiInstaller
Description: Detection of product ‘{A7037EB2-F953-4B12-B843-195F4D988DA1}’, feature ‘SQL_Tools_Ans’, Component ‘{0CECE655-2A0F-4593-AF4B-EFC31D622982}’ failed. The resource”does not exist.

EventId: 1001
Source: MsiInstaller
Description: Detection of product ‘{A7037EB2-F953-4B12-B843-195F4D988DA1}’, feature ‘SQL_Tools_Ans’ failed during request for component ‘{6E985C15-8B6D-413D-B456-4F624D9C11C2}’

When this issue occurs, you experience high CPU usage.
Cause

This issue occurs because the SQL Server 2012 components reference mismatched assemblies. This behavior causes native image generation to fail repeatedly on certain assemblies. Therefore, a repair operation is initiated on the installer package.

But the this problem has a much sinister side effect: it causes growth of the HKLM\Software registry hive. Except for the System hive, all the other registry hives are still restricted in size to a max of 2GB, see Registry Storage Space:

Views of the registry files are mapped in paged pool memory…The maximum size of a registry hive is 2 GB, except for the system hive.

As the runaway msiexec process bloats the Software registry hive your system may approach the maximum hive size and, even before that maximum is reached, the large hive will consume more and more of the very precious kernel paged pool memory. The system may start to exhibit erratic behavior, complaining about low ‘resources’, closing connections and other symptoms. For an example of how this erratic bahior may manifest, read Why the registry size can cause problems with your SQL 2012 AlwaysOn/Failover Cluster setup.

If you installed SQL Server 2012 SP1 recently follow the steps in KB2793634 for a fix

The biggest problem with the registry bloat erratic behavior is that it occurs long after the SQL Server 2012 SP1 installation and is quite difficult, even for expert users, to trace back the causality of the server erratic behavior to the SP1 installation. If you are uncertain if this issue is affecting you, run dir %SystemRoot%\system32\config and check the size of the SOFTWARE file, which is the storage of this registry hive. If you are indeed dealing with a bloated registry hive, visit KB2498915: How to Compress “Bloated” Registry Hives:

1) Boot from a WinPE disk.
2) Open regedit while booted in WinPe, load the bloated hive under HLKM. (e.g. HKLM\Bloated)
3) Once the bloated hive has been loaded, export the loaded hive as a “Registry Hive” file with a unique name. (e.g. %windir%\system32\config\compressedhive)
a) You can use dir from a command line to verify the old and new sizes of the registry hives.
4) Unload the bloated hive from regedit. (If you get an error here, close the registry editor. Then reopen the registry editor and try again.)
5) Rename the hives so that you will boot with the compressed hive.
e.g.
c:\windows\system32\config\ren software software.old
c:\windows\system32\config\ren compressedhive software

Update: At the time of writing this the SQL Server SP1 download page recommends installing the KB2793634. Kind folk that have been in the trenches and had to deal with this problem have also give me more feedback about the symptoms and solutions. I’ve been told that if the registry is already bloated due to the msiexec issue then applying the KB2793634 will not be enough. The fix only prevents further auto-restarts of msiexec, but it does not compress the registry. The following registry keys *may* be already huge:

  • HKLM\Software\Wow6432Node\Microsoft\.NETFramework\v2.0.50.27\NGENService
  • HKLM\SOFTWARE\Microsoft\.NETFramework\v2.0.50727\NGENService
  • HKLM\Software\Wow6432Node\Microsoft\.NETFramework\v4.0.30319\NGENService
  • HKLM\SOFTWARE\Microsoft\.NETFramework\v4.0.30319\NGENService

Symptoms of registry hive bloat include, but are not limited to:

  • Cluster service is crashing and “The Group Policy Client service failed the logon. Insufficient system resources exist to complete the requested service” are logged into the event viewer log.
  • It is impossible to perform additional installations, or you can do this just shortly after rebooting the server.
  • Users with domain accounts are unable to logon on the machine. When they log in a temporary profile is loaded.
  • In Server Manager the Roles and Features are not listed anymore.
  • Many actions are prevented with the error message ” Insufficient system resources exist to complete the requested service”.

The recommended action is to open a support case and contact CSS. If you are brave, can’t afford CSS support contracts, and you are convinced that the problems you’re experiencing is due to the SQL Server 2012 SP1 registry bload, you may try the following at your own risk:

  1. Install the KB2793634 hot fix, as per the procedure described in the KB article.
  2. From cmd prompt run %SystemRoot%\Microsoft.Net\Framework\v4.0.30319\ngen.exe executequeueditems and %SystemRoot%\Microsoft.Net\Framework64\v4.0.30319\ngen.exe executequeueditems
  3. Compress the registry hive using a WinPE disk or from the Windows recovery mode (press F8 at the boot screen). Note that the Windows system on which you attempt to compress the hive must have the KB973817 installed: The Reg.exe utility does not compress a registry key when the utility saves a registry key to a hive file on a computer that is running Windows Server 2008, Windows Vista, Windows 7 or Windows Server 2008 R2.
  4. After the compression procedure is complete, from a cmd prompt run SCF /SCANNOW. See KB929833 for more details.

In the case when the registry hive bloat prevents even the installation of the SP1 hotfix you may attempt increasing the hive size to 4GB, see RegistrySizeLimit. Editing this particular registry key to a wrong value can result in your system blue-screening during boot due to BAD_SYSTEM_CONFIG_INFO kernel panic. Each boot. Use at your own risk. The correct value to put is 0xFFFFFFFF.

If you plan to upgrade an RTM SQL Server 2012 instance to SP1 apply the slipstream SP1+CU2

If you have an RTM instance of SQL Server 2012 and plan to upgrade it to SP1 it is recommended to create a slipstream SP1+CU2 instalation and apply this instead. See Product Updates in SQL Server 2012 Installation, in SQL Server 2012 the slipstream functionality has been renamed to Product Updates:

The Product Update feature replaces the Slipstream functionality that was available in SQL Server 2008 PCU1. Therefore the command-line parameters, /PCUSource and /CUSource, associated with Slipstream functionality should no longer be used. The parameters will continue to work, but may be removed in a future release of SQL Server Setup. The /UpdateSource parameter combines the functionality of the Slipstream parameters.

Frankly I think the term ‘slipstream’ was less ambiguous than ‘product update’ but then who am I to complain…

How to shrink the SQL Server log

July 27th, 2012

I noticed that my database log file has grown to 200Gb. I tried to shrink it but is still 200Gb. How can I shrink the log and reduce the file size?

The problem is that even after you discover about DBCC SHRINKFILE and attempt to reduce the log size, the command seems not to work at all and leaves the log at the same size as before. What is happening?

If you look back at What is an LSN: Log Sequence Number you will see that LSNs are basically pointers (offsets) inside the log file. There is one level of indirection (the VLF sequence number) and then the rest of the LSN is basically an offset inside the Virtual Log File (the VLF). The log is always defined by the two LSNs: the head of the log (where new log records will be placed) and the tail of the log (what is the oldest log record of interest). Generating log activity (ie. any updates in the database) advance the head of the log LSN number. The tail of the log advances when the database log is being backed up (this is a simplification, more on it later).

Read the rest of this entry »

Adding a nullable column can update the entire table

February 16th, 2012

In a previous article Online non-NULL with values column add in SQL Server 2012 I talked about how adding a non-null column with default values is now an online operation in SQL Server 2012 and I mentioned how the situation when the newly added column may increase the rowsize can result in the operation being performed offline:

In the case when the newly added column increases the maximum possible row size over the 8060 bytes limit the column cannot be added online.

In this article I want to show you how such a situation can arise and how it impacts even the case that prior to SQL Server 2012 was always online, namely adding a nullable column. Lets consider the following example:

create table T (
	c char(1) null,
	filler char(7998) null,
	vc1 varchar(50) null,
	vc2 varchar(50) null);
go	

insert into  T (c, vc1, vc2) values
	('A', NULL, NULL),
	('B', replicate ('X', 50), NULL),
	('C', NULL, replicate('Y', 50)),
	('D', replicate ('X', 50), replicate('Y', 50));
go

alter table T add some_int int null;
go

If you run the snippet above in SQL Server 2005, SQL Server 2008 or SQL Server 2008 R2 it will succeed, and the column some_int will be added online, as a metadata only operation. However, the resulting table has some interesting properties. Lets try to update the newly added some_int column:

update T set some_int = null
	where c= 'A';
update T set some_int = null
	where c= 'B';
update T set some_int = null
	where c= 'C';
update T set some_int = null
	where c= 'D';
go

(1 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)
Msg 511, Level 16, State 1, Line 7
Cannot create a row of size 8064 which is greater than the allowable maximum row size of 8060.
The statement has been terminated.

3 updates succeeded, the fourth failed. Well, I’ve been warned when I created the table, right? Warning: The table “T” has been created, but its maximum row size exceeds the allowed maximum of 8060 bytes. INSERT or UPDATE to this table will fail if the resulting row exceeds the size limit. But I’m updating a fixed length column and, furthermore, I’m updating it from NULL to NULL. Lets try something else:

select * into T2 from T;
go
Msg 511, Level 16, State 1, Line 1
Cannot create a row of size 8064 which is greater than the allowable maximum row size of 8060.
The statement has been terminated.

OK, how about this:

alter table T rebuild;
go
Msg 511, Level 16, State 1, Line 1
Cannot create a row of size 8064 which is greater than the allowable maximum row size of 8060.

The statement has been terminated.

So my previous sequence of actions led to a table that I cannot rebuild! Clearly, not a very desirable state.

SQL Server 2012 nullable ADD COLUMN

In SQL Server 2012 the situation above cannot happen. The adding of the column would be blocked:

create table T (
c char(1) null,
filler char(7998) null,
vc1 varchar(50) null,
vc2 varchar(50) null);
go

insert into T (c, vc1, vc2) values
('A', NULL, NULL),
('B', replicate ('X', 50), NULL),
('C', NULL, replicate('Y', 50)),
('D', replicate ('X', 50), replicate('Y', 50));
go

alter table T add some_int int null;
go

Msg 511, Level 16, State 1, Line 1
Cannot create a row of size 8064 which is greater than the allowable maximum row size of 8060.

The statement has been terminated.

So SQL Server 2012 is able to detect the problem upfront and prevent the table from reaching the state in which it cannot be rebuilt. But detecting this situation is not a straightforward matter of table metadata and column size, the situation arises only if the data in the table has this problem. After all the following sequence succeeds:

create table T (
	c char(1) null,
	filler char(7998) null,
	vc1 varchar(50) null,
	vc2 varchar(50) null);
go	

insert into  T (c, vc1, vc2) values
	('A', NULL, NULL),
	('B', replicate ('X', 50), NULL),
	('C', NULL, replicate('Y', 50));
go

alter table T add some_int int null;
go

Indeed it is only the record 'D' that has a problem because both variable columns vc1 and vc2 have values in the row that are not NULL (and of a certain size). The ALTER TABLE ... ADD COLUMN in this situation must validate each row to ensure that it fits in the page, including the newly added column size. Therefore every row gets updated and the newly added column value of NULL is populated in the row image, thus enforcing that every row fits in the table. If any row grows over the maximum size due to the newly added column then the ALTER fails and all the rows updated thus far are rolled back.

If adding a nullable column in SQL Server 2012 has the potential of increasing the row size over the 8060 size then the ALTER performs an offline size-of-data update to every row of the table to ensure it fits in the page. This behavior is new in SQL Server 2012.

This situation can arise when adding a new non-sparse fixed length column or a variable non-nullable column with default value to a table that already has the potential of creating rows over the maximum size of 8060. This new behavior (update every row in order to validate they all fit after the column is added) does not occur when adding a nullable variable length column or a sparse fixed length column.

Show all index and heap access operators in the plan cache

January 27th, 2012

with xmlnamespaces (default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
select x.value(N'@NodeId',N'int') as NodeId
	, x.value(N'@PhysicalOp', N'sysname') as PhysicalOp
	, x.value(N'@LogicalOp', N'sysname') as LogicalOp
	, ox.value(N'@Database',N'sysname') as [Database]
	, ox.value(N'@Schema',N'sysname') as [Schema]
	, ox.value(N'@Table',N'sysname') as [Table]
	, ox.value(N'@Index',N'sysname') as [Index]
	, ox.value(N'@IndexKind',N'sysname') as [IndexKind]
	, x.value(N'@EstimateRows', N'float') as EstimateRows
	, x.value(N'@EstimateIO', N'float') as EstimateIO
	, x.value(N'@EstimateCPU', N'float') as EstimateCPU
	, x.value(N'@AvgRowSize', N'float') as AvgRowSize
	, x.value(N'@TableCardinality', N'float') as TableCardinality
	, x.value(N'@EstimatedTotalSubtreeCost', N'float') as EstimatedTotalSubtreeCost
	, x.value(N'@Parallel', N'tinyint') as DOP
	, x.value(N'@EstimateRebinds', N'float') as EstimateRebinds
	, x.value(N'@EstimateRewinds', N'float') as EstimateRewinds
	, st.*
	, pl.query_plan
from sys.dm_exec_query_stats as st
cross apply sys.dm_exec_query_plan (st.plan_handle) as pl
cross apply pl.query_plan.nodes('//RelOp[./*/Object/@Database]') as op(x)
cross apply op.x.nodes('./*/Object') as ob(ox)


I recently needed a query to look into the current query plan cache and locate all actual data access operators (index scans, index seeks, table scans). This is the query I used, I decided to place it here if someone else find it useful and, more importantly, so that I can find it again when I needed it…

SQL Server table columns under the hood

October 20th, 2011

You probably can easily answer a question like ‘What columns does this table have?’. Whether you use the SSMS object explorer, or sp_help, or you query sys.column, the answer is fairly easy to find. But what is I ask ‘What are the physical columns of this table?’. Huh? Is there any difference? Lets see.

At the logical layer tables have exactly the structure you declare it in your CREATE TABLE statement, and perhaps modifications from ALTER TABLE statements. This is the layer at which you can look into sys.columns and see the table structure, or look at the table in SSMS object explorer and so on and so forth. But there is also a lower layer, the physical layer of the storage engine where the table might have surprisingly different structure from what you expect.

Inspecting the physical table structure

To view the physical table structure you must use the undocumented system internals views: sys.system_internals_partitions and sys.system_internals_partition_columns:

select p.index_id, p.partition_number,
	pc.leaf_null_bit,
	coalesce(cx.name, c.name) as column_name,
	pc.partition_column_id,
	pc.max_inrow_length,
	pc.max_length,
	pc.key_ordinal,
	pc.leaf_offset,
	pc.is_nullable,
	pc.is_dropped,
	pc.is_uniqueifier,
	pc.is_sparse,
	pc.is_anti_matter
from sys.system_internals_partitions p
join sys.system_internals_partition_columns pc
	on p.partition_id = pc.partition_id
left join sys.index_columns ic
	on p.object_id = ic.object_id
	and ic.index_id = p.index_id
	and ic.index_column_id = pc.partition_column_id
left join sys.columns c
	on p.object_id = c.object_id
	and ic.column_id = c.column_id
left join sys.columns cx
	on p.object_id = cx.object_id
	and p.index_id in (0,1)
	and pc.partition_column_id = cx.column_id
where p.object_id = object_id('...')
order by index_id, partition_number;

Lets inspect some simple table structures:

create table users (
	user_id int not null identity(1,1),
	first_name varchar(100) null,
	last_name varchar(100) null,
	birth_date datetime null);

Running our query after, of course, we specify object_id('users'):

We can see that the physical structure is very much as we expected: the physical rowset has 4 physical columns, of the expected types and sizes. One thing to notice is that, although the column order is the one we specified, the columns are layout on disk in a different order: the user_id is stored in row at offset 4, followed by the birth_date at offset 8 and then by the two variable length columns (first_name and last_name) that have negative offsets, an indication that they reside in the variable size portion of the row. This should be of no surprise as we know that the row format places all fixed length columns first in the row, ahead of the variable length columns.

Adding a clustered index

Our table right now is a heap, we should make user_id a primary key, and lets check the table structure afterward:

alter table users add constraint pk_users_user_id primary key (user_id);


As we can see, the table has changed from a heap into a clustered index (index_id changed from 0 to 1) and the user_id column has become part of the key.

Non-Unique clustered indexes

Now lets say that we want a different clustered index, perhaps by birth_date (maybe our application requires frequent range scans on this field). We would change the primary key into a non-clustered primary key (remember, the primary key and the clustered index do not have to be the same key) and add a clustered index by the birth_date column:

alter table users drop constraint pk_users_user_id;
create clustered index cdx_users on users (birth_date);
alter table users add constraint
	pk_users_user_id primary key nonclustered  (user_id);


Several changes right there:

  • A new index has appeared (index_id 2), which was expected, this is the non-clustered index that now enforces the primary key constraint on column user_id.
  • The table has a new physical column, with partition_column_id 0. This new column has no name, because it is not visible in the logical table representation (you cannot select it, nor update it). This is an uniqueifier column (is_uniqueifier is 1) because we did not specify that our clustered index in unique:

    If the clustered index is not created with the UNIQUE property, the Database Engine automatically adds a 4-byte uniqueifier column to the table. When it is required, the Database Engine automatically adds a uniqueifier value to a row to make each key unique. This column and its values are used internally and cannot be seen or accessed by users.



    Klaus Aschenbrenner has a good series of articles that explain uniqueifier columns in more detail.

  • The non-clustered index that enforces the primary key constraint has 3 physical columns: user_id and two unnamed ones. This is because each row in the non-clustered index contains the corresponding clustered index row key values, in our case the birth_date column and the uniqueifier column.

Online operations and anti-matter columns

Is the uniqueifier column the only hidden column in a table? No. Lets rebuild our non-clustered index, making sure we specify it to be an online operation:

alter index pk_users_user_id on users rebuild with (online=on);


Notice how the non-clustered index now has one more column, a new column that has the is_antimatter value 1. As you probably guess, this is an anti-matter column. For an in-depth explanation of what is the purpose of the anti-matter column I recommend reading Online Indexing Operations in SQL Server 2005:

During the build phase, rows in the new “in-build” index may be in an intermediate state called antimatter. This mechanism allows concurrent DELETE statements to leave a trace for the index builder transaction to avoid inserting deleted rows. At the end of the index build operation all antimatter rows should be cleared.

Note that even after the online operation finishes and the antimatter rows are removed, the antimatter column will be part of the physical rowset structure.

There could exist more hidden columns in the table, for example if we enable change tracking:

alter table users ENABLE CHANGE_TRACKING;


Enabling change tracking on our table has added one more hidden column.

Table structure changes

Next lets look at some table structure modifying operations: adding, dropping and modifying columns. Were going to start anew with a fresh table for our examples:

create table users  (
	user_id int not null identity(1,1) primary key,
	first_name char(100) null,
	last_name char(100) null,
	birth_date datetime null);
go


Next lets modify some columns: we want to make the birth_date not nullable and reduce the length of the first_name to 75:

alter table users alter column birth_date datetime not null;
alter table users alter column first_name char(75);
go


Notice how the two alter operations ended up in adding a new column each, and dropping the old column. But also note how the null bit and the leaf_offset values have stayed the same. This means that the column was added ‘in place’, replacing the dropped column. This is a metadata only operation that did not modify any record in the table, it simply changed how the data in the existing records is interpreted.

But now we figure out the 75 characters length is wrong and we want to change it back to 100. Also, the birth_date column probably doesn’t need hours, so we can change it to a date type:

alter table users alter column birth_date date not null;
alter table users alter column first_name char(100);
go


The birth_date column has changed type and now it requires only 3 bytes, but the change occurred in-place, just as the nullability change we did before: it remains at the same offset in the record and it has the same null bit. However, the first_name column was moved from offset 8 to offset 211, and the null bit was changed from 4 to 5. Because the first_name column has increased in size it cannot occupy the same space as before in the record and the record has effectively increased to accommodate the new first_name column. This happened despite the fact that the first_name column was originally of size 100 so in theory it could reclaim the old space it used in the record, but the is simply a too corner case for the storage engine to consider.

By now we figure that the fixed length 100 for the first_name and last_name columns was a poor choice, so we would like to change them to more appropriate variable length columns:

alter table users alter column first_name varchar(100);
alter table users alter column last_name varchar(100);
go


The type change from fixed length column to variable length column cannot be done in-place, so the first_name and last_name columns get new null bit values.They also have negative leaf_offset values, which is typical for variable length columns as they don’t occupy fixed positions in the record.

Next lets change the length of the variable columns:

alter table users alter column first_name varchar(250);
alter table users alter column last_name varchar(250);
go

For this change the column length was modified without dropping the column and adding a new one. An increase of size for a variable length columns is one of the operations that is really altering the physical column, not dropping the column and adding a new one to replace it. However, a decrease in size, or a nullability change, does again drop and add the column, as we can quickly check now:

alter table users alter column first_name varchar(100);
alter table users alter column last_name varchar(250) not null;
go


Finally, lets say we tried to add two more fixed length columns, but we we undecided on the name and length so we added a couple of columns, then deleted them and added again a new one:

alter table users add mid_name char(50);
alter table users add surname char(25);
alter table users drop column mid_name;
alter table users drop column surname;

alter table users add middle_name char(100);
go


This case is interesting because it shows how adding a new fixed column can reuse the space left in the row by dropped fixed columns, but only if the dropped columns are the last fixed columns. In our table the columns mid_name and surname where originally added at offset 211 and 261 respectively and their length added up to 75 bytes. After we dropped them, the middle_name column we added is placed at offset 211, thus reusing the space formerly occupied by the dropped columns. This happens even though the length of the newly added column is 100 bytes, bigger than the 75 bytes occupied by the dropped columns before.

By now, our 5 column table has actually 15 columns in the physical storage format, 10 dropped columns and 5 usable columns. The table uses 412 bytes of fixed space for the 3 fixed length columns that have a total length of only 112 bytes. The variable length columns that now store the first_name and last_name are stored in the record after these 412 reserved bytes for fixed columns that are now dropped. Since the records always consume all the reserved fixed size, this is quite wasteful. How do we reclaim it? Rebuild the table:

alter table users rebuild;
go


As you can see the rebuild operation got rid off the dropped columns and now the physical storage layout is compact, aligned with the logical layout. So whenever doing changes to a table structure remember that at the storage layer most changes are cumulative, they are most times implemented by dropping a column and adding a new column back, with the new type/length/nullability. Whenever possible the a modified column reuses the space in the record and newly added columns may reuse space previously used by dropped columns.

Partitioned Tables

When partitioning is taken into account another dimension of the physical table structure is revealed. To start, lets consider a typical partitioned table:

create partition function pf (date) as range for values ('20111001');
go

create partition scheme ps as partition pf all to ([PRIMARY]);
go

create table sales (
	sale_date date not null,
	item_id int not null,
	store_id int not null,
	price money not null)
	on ps(sale_date);
go


As we know, the partitioned tables are, from a physical point of view, a collection of rowsets that belong to the same logical object (the 'table'). Looking under the hood shows that our table has two rowsets, and they have identical column structure. Typically new partitions are added by the ETL process that uploads data into a staging table that gets switched in using a fast partition switch operation:

create table sales_staging (
	sale_date date not null,
	item_id int not null,
	store_id int not null,
	price numeric(10,2) not null,
	constraint check_partition_range
		check (sale_date = '20111002'))
	on [PRIMARY];

alter partition scheme ps next used [PRIMARY];
alter partition function pf() split range ('20111002');
go

alter table sales_staging switch to sales partition $PARTITION.PF('20111002');
go
Msg 4944, Level 16, State 1, Line 2

ALTER TABLE SWITCH statement failed because column 'price' has data type numeric(10,2) in source table 'test.dbo.sales_staging' which is different from its type money in target table 'test.dbo.sales'.

OK, so we made a mistake in the staging table, so lets correct it, then try to switch it in again:

alter table sales_staging alter column price money not null;
alter table sales_staging switch to sales partition $PARTITION.PF('20111002');
go


We see that the partition switch operation has switched in the rowset of the staging table into the second partition of the sales table. But since the staging table had an operation that modified a column type, which in effect has dropped the numeric price columns and added a new price column of the appropriate money type, the second rowset of the partitioned table now has 5 columns, including a dropped one. The partition switch operation brings into the partitioned table the staging table as is, dropped columns and all. What that means is that partitions of the partitioned table can have, under the hood, a completely different physical structure from one another. Normally this should be of little concern and just a courisity to woe the newbies at DBA cocktail parties, but this problem has a darker side, known as KB2504090:

This issue occurs because the accessor that SQL Server uses to insert data into different partitions recognizes the metadata changes incorrectly. When data is inserted into the new partition that is created after a column is dropped, the number of the maximum nullable columns in the new partition may be one less than the number of the maximum nullable columns in the old partition.

If you ever find your partitioned tables not to have an uniform internal structure across all partitions, I recommend you rebuild the partition that is different. This simple query can be used to look at the number of physical columns in each partition of a table:

select count(*) as count_columns,
	index_id,
	partition_number
from sys.system_internals_partitions p
join sys.system_internals_partition_columns pc
	on p.partition_id = pc.partition_id
where p.object_id = object_id('sales')
group by index_id, partition_number;
go


This query shows that partitions 1 and 3 have 4 physical columns, while partition 2 has 5. We can rebuild partition 2:

alter table sales rebuild partition = 2;
go

With this operation we have rebuild the partition number 2 from scratch and removed all dropped columns in the process.

Note that my query above would only detect differences in the number of physical columns, but two partitions can still have a very different physical layout even if they have the same number of physical columns, eg. one can have physical column number 9 dropped and physical column number 10 used, while the other can have the opposite. Use your judgement when looking at the internal physical column layout to understand if they are truly the same.

Understanding Hash, Sort and Exchange Spill events

October 19th, 2011

Certain SQL Server query execution operations are calibrated to perform best by using a (somewhat) large amount of memory as intermediate storage. The Query Optimizer will choose a plan and estimate the cost based on these operators using this memory scratch-pad. But this is, of course, only an estimate. At execution the estimates may prove wrong and the plan must continue despite not having enough memory. In such an event, these operators spill to disk. When a spill occurs, the memory scratch-pad is flushed into tempdb and more data is accommodated in the (now) free memory. When the data flushed to tempdb is needed again, is read from disk. Needless to say, spilling into tempdb is order of magnitude slower than using just the memory scratch-pad. Monitoring for spilling is specially important in ETL jobs, since these occurrences may cause the ETL job to stretch for many more minutes, sometimes even hours. For an exhaustive discussion of ETL, including some references to spills, see The Data Loading Performance Guide.

Hash Warning Event

Hash recursion occurs when the build input does not fit into available memory, resulting in the split of input into multiple partitions that are processed separately. If any of these partitions still do not fit into available memory, it is split into subpartitions, which are also processed separately. This splitting process continues until each partition fits into available memory or until the maximum recursion level is reached (displayed in the IntegerData data column).

This is one of the most common spills. Hash Join is a darling of the Query Optimizer, as it a very fast operator that can join two unsorted sources efficiently, requiring a single pass over each source. Not only that, but it can also be used for duplicate removal (eg. DISTINCT clause) and grouping aggregates. See Understanding Hash Joins for more details.

Unfortunately it also requires a lot of memory. Hash spill occurrences are usually an indication of bad cardinality estimates that are in turn caused, most times, by missing or outdated statistics. The first action to do, when faced with Hash Join spill warnings, is to update (or create) stats on the columns involved. If the problem persists then you must resort to a different join type. Since the optimizer would had already picked a better join if it could, you need to look at the problem from a different angle: how could you help the optimizer to pick a different join, w/o forcing it?

The optimizer would pick a LOOP if it could seek the inner side for a reasonable number of times, so perhaps you need an index on the inner side to allow a seek and/or an index on the outer side that would filter the produced output early so it can reduce the number of probes in the inner side. See Understanding Nested Loops Joins for more details.

The other alternative is a MERGE join, but merge requires the input on both sides to be sorted. Adding an index to each of the sources on both sides of the join that provides the order guarantee would likely woe the optimizer into using the MERGE join. See Understanding Merge Joins for more details.

Sort Warning Event

The Sort Warnings event class indicates that sort operations do not fit into memory. This does not include sort operations involving the creation of indexes, only sort operations within a query (such as an ORDER BY clause used in a SELECT statement).

If a query requires an order guarantee (eg. it has an ORDER BY clause, or it projects a function like ROW_NUMBER) and there is no index to guarantee the order then there is little choice to have: the execution must sort the input before proceeding. If the input is small then the sort occurs in memory and is very cheap, but in this case no spill warning would occur. The Query Optimizer is usually smart to postpone the sorting until all filtering is complete, so if it still spills it usually means there is no more filtering that can be applied. The solution when faced with Sort spills is usually to add a covering index that provides the desired order.

Another case of Sort spill warning is when the Query Optimizer goes creative and adds a sort into a plan that technically does not require an order guarantee. Such events are though very rare and esoteric. The action to take really depends from case to case.

Exchange Spill Event

The Exchange Spill event class indicates that communication buffers in a parallel query plan have been temporarily written to the tempdb database. This occurs rarely and only when a query plan has multiple range scans.

You’ll probably never going to hit this problem. If you do, I recommend you go over the recommendations in the linked article: Exchange Spill Event

Conclusion

Technically there is one more spill class: the spool spill. Since spools are *meant* to spill, the presence of a spool spill is usually less of a concern.

The purpose of this article is to show that there is ample documentation available on MSDN regarding these spill events. The tempdb spills are easily detectable and reasonably explained in the product documentation. Presence of spills may indicate potential performance problems as a spill involves disk reads and writes and is many times slower than the corresponding in-memory-only operation. They also add overload to tempdb and may cause contention.

Online non-NULL with values column add in SQL Server 2012

July 13th, 2011

Prior to SQL Server 2012 when you add a new non-NULLable column with default values to an existing table a size-of data operation occurs: every row in the table is updated to add the default value of the new column. For small tables this is insignificant, but for large tables this can be so problematic as to completely prohibit the operation. But starting with SQL Server 2012 the operation is, in most cases, instantaneous: only the table metadata is changed, no rows are being updated.

Lets look at a simple example, we’ll create a table with some rows and then add a non-NULL column with default values. First create and populate the table:

create table test (
	id int not null identity(1,1) primary key,
	someValue int not null);
go

set nocount on;
insert into test (someValue) values (rand()*1000);
go 1000

We can inspect the physical structure of the table’s records using DBCC PAGE. First lets find the page that contains the first record of the table:

select %%physloc%%, * from test where id = 1;

In my case this returned 0xD900000001000000, which means slot 0 on page 0xD9 (aka. 217) of file 1, and my test database has the DB_ID 6. Hence the parameters to DBCC PAGE

dbcc traceon (3604,-1)
dbcc page(6,1,217,3)

Page @0x0000000170D5E000

m_pageId = (1:217)                  m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 84    m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594043432960
Metadata: PartitionId = 72057594039042048                                Metadata: IndexId = 1
Metadata: ObjectId = 245575913      m_prevPage = (0:0)                  m_nextPage = (1:220)
pminlen = 12                        m_slotCnt = 476                     m_freeCnt = 4
m_freeData = 7236                   m_reservedCnt = 0                   m_lsn = (30:71:25)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 2135435720             DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 15

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000000AEBA060

0000000000000000:   10000c00 01000000 34020000 020000†††††††††††††........4......

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1
Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 564

Note the last LSN that updated the page (30:71:25) and the size of the record in slot 0 (15 bytes). Now lets add a non-NULL column with default values:

alter table test add otherValue int not null default 42 with values;

We can select from the table and see that the table was changed and the rows have value 42 for the newly added column:

select top(2) * from test;

id          someValue   otherValue
----------- ----------- -----------
1           564         42
2           387         42

Yet if we inspect again the page, we can see that is unchanged:

dbcc traceon (3604,-1)
dbcc page(6,1,217,3)

Page @0x0000000170D5E000

m_pageId = (1:217)                  m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 84    m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594043432960
Metadata: PartitionId = 72057594039042048                                Metadata: IndexId = 1
Metadata: ObjectId = 245575913      m_prevPage = (0:0)                  m_nextPage = (1:220)
pminlen = 12                        m_slotCnt = 476                     m_freeCnt = 4
m_freeData = 7236                   m_reservedCnt = 0                   m_lsn = (30:71:25)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 2135435720             DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 15
Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000000E83A060
0000000000000000:   10000c00 01000000 34020000 020000†††††††††††††........4......

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1                              

Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 564                     

Slot 0 Column 3 Offset 0x0 Length 4 Length (physical) 0
otherValue = 42 

The page header is unchanged, the last LSN is still (30:71:25), proof that the page was not modified, and the physical record is unchanged and has the same size as before. Yet DBCC shows a Column 3 and its value 42! If you pay attention you’ll notice that the Column 3 though has an Offset 0×0 and a physical length of 0. Column 3 is somehow materialized out of thin air, as it does not physically exists in the record on this page. The ‘magic’ is that the table metadata has changed and it now contains a column with a ‘default’ value:

select pc.* from sys.system_internals_partitions p
	join sys.system_internals_partition_columns pc on p.partition_id = pc.partition_id
	where p.object_id = object_id('test');

Notice that sys.system_internals_partition_columns now has two new columns that are SQL Server 2012 specific: has_default and default_value. The column we added to the test table (the third row in the image above) has a default with value 42. This is how SQL Server 2012 knows how to show a value for Column 3 for this record, even though is physically missing on the page. With this ‘magic’ in place the ALTER TABLE will no longer have to update every row in the table and the operation is fast, metadata-only, no matter the number of rows in the table. This new behavior occurs automatically, no special syntax or setting is required, the engine will simply do the right thing. There is no penalty from having a missing value in a row. The ‘missing’ value can be queried, updated, indexed, exactly as if the update during ALTER TABLE really occurred. There is no measurable performance penalty from having a default value.

What happens when we update a row? The ‘default’ value is pushed into the row, even if the column was not modified. Consider this update:

update test set someValue = 565 where id = 1;

Although we did not touch the otherValue column, the row now was modified and it contains the materialized value:

dbcc page(6,1,217,3)

...
m_freeData = 7240                   m_reservedCnt = 0                   m_lsn = (31:271:2)
...
Slot 0 Offset 0x1c35 Length 19

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 19

Memory Dump @0x000000000AB8BC35

0000000000000000:   10001000 01000000 35020000 2a000000 030000††††........5...*......

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1                              

Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 565                     

Slot 0 Column 3 Offset 0xc Length 4 Length (physical) 4
otherValue = 42         

KeyHashValue = (8194443284a0)
Slot 1 Offset 0x60 Length 15
Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000000AB8A060
0000000000000000:   10000c00 02000000 83010000 020000†††††††††††††........ƒ......

Slot 1 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 2                              

Slot 1 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 387                     

Slot 1 Column 3 Offset 0x0 Length 4 Length (physical) 0
otherValue = 42

Notice how the physical record has increased in size (19 bytes vs. 15), the record has the value 42 in it (the hex 2a000000) and the Column 3 now has a real offset and physical size. So the update has trully materialized the default value in the row image. I intentionally copied the output of DBCC PAGE for the next slot in the page, to show that the record with id=2 was unaffected, it continues to have a smaller size of 15 bytes and Column 3 has no physical length.

Default value vs. Default constraint

Is worth saying that the new SQL Server 2012 default column value is not the same as the default value constraint. The default value is captured when the ALTER TABLE statement is run and can never change. Only rows existing in the table at the time of running ALTER TABLE statement will have missing ‘default’ values. By contrast the default constraint can be dropped or modified and new rows inserted after the ALTER TABLE will always have a value present in row for the new column. Any REBUILD operation on the table (or on the clustered index) will materialize all the missing values as the rows are being copied from the old hobt to the new hobt. The new hobt columns (sys.system_internals_partition_columns) will loose the has_default and default_value attributes, in effect loosing any trace that this column was added online. A default constraint by contrast will be preserved as a table is rebuilt.

Restrictions

Not all data types and default values can be added online. BLOB values like varchar(max), nvarchar(max), varbinary(max) and XML cannot be added online (and frankly I see no valid data model that has a non-NULL BLOB with a default…). Types that cannot be converted to sql_variant cannot be added online, like hierarchy_id, geometry and geography or user CLR based UDTs. Default expressions that require a different value for each row, like NEWID or NEWSEQUENTIALID cannot be added online (the default expression has to be a runtime constant, not to be confused with a deterministic expression, see Conor vs. Runtime Constant Functions for more details). In the case when the newly added column increases the maximum possible row size over the 8060 bytes limit the column cannot be added online. And is an Enterprise Edition only feature. For all the cases above the behavior will revert to adding the column ‘offline’, by updating every row in the table during the ALTER TABLE statement, creating a size-of-data update. When such a situation occurs a new XEvent is fired, which contains the reason why a size-of-data update occurred: alter_table_update_data.

This server supports version 662 and earlier…

November 23rd, 2010

A new error started showing up in SQL Server 2008 SP2 installations:

The database cannot be opened because it is version 661. This server supports version 662 and earlier. A downgrade path is not supported.

661 sure is earlier than 662, so what seems to be the problem? This error message is a bit misleading. SQL Server 2008 supports database version 655 and earlier. But with support for 15000 partitions in SQL Server 2008 SP2, databases enabled for 15000 partitions are upgraded to version 662. This upgrade is necessary to prevent an SQL Server 2008 R2 instance from attaching a database that has more than 1000 partitions in it, since the code in R2 RTM does not understand 15000 partitions and the effects would be unpredictable. So SQL Server 2008 SP2 does indeed support version 662, but it does not support version 661. This behavior is explained in the Support for 15000 Partitions.docx document, although the database versions involved are not explicitly called out.

So the error message above should be really read as:

The database cannot be opened because it is version 661. This server supports versions 662, 655 and earlier than 655. A downgrade path is not supported

With this information the correct resolution can be achieved: the user is trying to attach a SQL Server 2008 R2 database (v. 661) to an SQL Server 2008 SP2 instance. This is not supported. User has to either upgrade the SQL Server 2008 SP2 instance to SQL Server 2008 R2, or it has to attach the database back to a R2 instance and copy out the data from the database into SQL Server 2008 instance database, eg. using the Import and Export Wizard.

The puzzle of U locks in deadlock graphs

May 12th, 2010

In a stackoverflow.com question the user has asked how come a SELECT statement could own a U mode lock?

S-U-X deadlock graph

S-U-X deadlock graph

The deadlock indeed suggests that the deadlock victim, a SELECT statement, is owning an U lock on the PK_B index. Why would a SELECT own an U lock? The query had no table hints and was a standalone query, not part of a multi-statement transaction that could had aquired the U lock in previous staements.

Turns out that the SELECT was actually not owning any U lock. The deadlock graph files (the *.xdl files) are in fact XML files and they can be opened as XML and inspected, for a little more detail than the visual deadlock graph visualizer permits. Here is the actual resource list in the deadlock XML:

<resource-list>
   <keylock hobtid="72057594052411392" dbid="10"
         objectname="A" indexname="PK_A" id="lock17ed4040"
        mode="X" associatedObjectId="72057594052411392">
    <owner-list>
     <owner id="process4f5d000" mode="X"/>
    </owner-list>
    <waiter-list>
     <waiter id="processfa3c8e0" mode="S" requestType="wait"/>
    </waiter-list>
   </keylock>
   <keylock hobtid="72057594051166208" dbid="10"
        objectname="B" indexname="PK_B" id="lock22ea3940"
        mode="U" associatedObjectId="72057594051166208">
    <owner-list>
     <owner id="processfa3c8e0" mode="S"/>
    </owner-list>
    <waiter-list>
     <waiter id="process4f5d000" mode="X" requestType="convert"/>
    </waiter-list>
   </keylock>
  </resource-list>

As you can see, the resource lock22ea3940 is owned by the process processfa3c8e0 (the SELECT) indeed, but is owned in S mode. The process process4f5d000 (the UPDATE) is requesting this resource for a convert from U to X mode. So the true deadlock is like this:

  • SELECT owns a lock on the row in PK_B in S mode
  • SELECT wants a lock on the row in PK_A in S mode
  • UPDATE owns a lock on the row in PK_A in X mode
  • UPDATE also owns a U lock on the PK_B row. (S and U modes are compatible)
  • UPDATE is requesting a convert of the U lock it has on the row on PK_B to X mode

As you can see, there is no mysterious U lock owned by the SELECT. There is an U lock on the row in PK_B, but is owned by the UPDATE, which is requesting a convert to X for it. The fact that the resource is showned in the deadlock graph viewer in SSMS as being ‘Owner mode: U’ and pointing to the SELECT is simply an artifact of how SSMS displays the deadlock graph.

The lesson to take home is that the visual graphic deadlock graph display is usefull only to have a cursory glance at the deadlock cycle. The true meat and potatoes are in the XML, which has a lot more information. Not to mention that the information in the XML is actually correct, which helps investigation…

SQL Server 2008 R2 Express database size limit: 10GB

April 28th, 2010

The SQL Server 2008 R2 Express editions has increased the database size limit to 10Gb from the previous limit of 4Gb. This is great news for many developers, as the 4Gb limitation was by far the most difficult barrier preventing Express adoption. With today’s rate of generating data, the 4Gb limit was just plain small.

All the other limitations of SQL Server Express stay in place:

CPU
SQL Server Express only uses once CPU socket. It will use all cores and any Hyper-Threading logical processor in that socket though.
Memory
SQL Server Express limits the size of the data buffer pool to 1Gb.
Replication
SQL Server Express can only participate as a subscriber in a replication topology.
Service Broker
Two SQL Server Express instances cannot exchange Service Broker messages directly, the messages have to be routed through a higher level SKU.
SQL Agent
SQL Server Express does not have an Agent service and as such it cannot run Agent scheduled jobs.