How to Multicast messages with SQL Server Service Broker

July 20th, 2011

Starting with SQL Server 11 the the SEND verb has a new syntax and accepts multiple dialogs handles to send on:

SEND
   ON CONVERSATION [(]conversation_handle [,.. @conversation_handle_n][)]
   [ MESSAGE TYPE message_type_name ]
   [ ( message_body_expression ) ]
[ ; ]

With this syntax enhancement you can send a message to multiple destinations. This is not different from sending the same message multiple times. From the application point of view issuing one single SEND on 10 dialog handles is exactly the same as issuing 10 SEND statements on one dialog handle at a time. The improvement is in the sys.transmission_queue: issuing SEND multiple time would create multiple copies of the message body to be sent. By contrast the one single SEND on multiple handles will only store the message body once. We can see this if we look at the definition of sys.tranmission_queue in SQL Server 11:

sp_helptext 'sys.transmission_queue'

CREATE VIEW sys.transmission_queue AS
	SELECT conversation_handle = S.handle,
		to_service_name = Q.tosvc,
		to_broker_instance = Q.tobrkrinst,
		from_service_name = Q.fromsvc,
		service_contract_name = Q.svccontr,
		enqueue_time = Q.enqtime,
		message_sequence_number = Q.msgseqnum,
		message_type_name = Q.msgtype,
		is_conversation_error = sysconv(bit, Q.status & 2),
		is_end_of_dialog = sysconv(bit, Q.status & 4),
		message_body = ISNULL(Q.msgbody, B.msgbody),
		transmission_status = GET_TRANSMISSION_STATUS (S.handle),
		priority = R.priority
	FROM sys.sysxmitqueue Q
	LEFT JOIN sys.sysxmitbody B WITH (NOLOCK) ON Q.msgref = B.msgref
	INNER JOIN sys.sysdesend S WITH (NOLOCK)
             ON Q.dlgid = S.diagid AND Q.finitiator = S.initiator
	INNER JOIN sys.sysdercv R WITH (NOLOCK)
             ON Q.dlgid = R.diagid AND Q.finitiator = R.initiator
	WHERE is_member('db_owner') = 1

Compare this with the same view definition in SQL Server 2008 R2:

CREATE VIEW sys.transmission_queue AS
	SELECT conversation_handle = S.handle,
		to_service_name = Q.tosvc,
		to_broker_instance = Q.tobrkrinst,
		from_service_name = Q.fromsvc,
		service_contract_name = Q.svccontr,
		enqueue_time = Q.enqtime,
		message_sequence_number = Q.msgseqnum,
		message_type_name = Q.msgtype,
		is_conversation_error = sysconv(bit, Q.status & 2),
		is_end_of_dialog = sysconv(bit, Q.status & 4),
		message_body = Q.msgbody,
		transmission_status = GET_TRANSMISSION_STATUS (S.handle),
		priority = R.priority
	FROM sys.sysxmitqueue Q
	INNER JOIN sys.sysdesend S WITH (NOLOCK)
               ON Q.dlgid = S.diagid AND Q.finitiator = S.initiator
	INNER JOIN sys.sysdercv R WITH (NOLOCK)
               ON Q.dlgid = R.diagid AND Q.finitiator = R.initiator
	WHERE is_member('db_owner') = 1

You can see how in SQL Server 11 the message body was separated into a new system table (sys.sysxmitbody). Multicast SEND will create multiple entries in sys.sysxmitqueue (one for each dialog on which the message was multicasted) but only one entry in sys.sysxmitbody. Such a normalized storage scheme saves space consumed and, more importantly, amount of log generated during the SEND.

The Reversed Dialog pattern in publish-subscribe

The typical dialog pattern in pub-sub systems is for the subscriber to start the dialog and send an initial ‘subscribe’ message, then the subscription content is being delivered from the target (the publisher) to the initiator (the subscriber). I call this the Reverse Dialog Pattern because messages flow from the target to the initiator. Lets show with an example. We’ll create a publisher service that broadcast some important content to which services can subscribe to receive it. To spice it up, we’ll use a tag system to subscribe to optional content: all content is distributed with a list of associated tags, all subscribers specify the tag they’re interested in. Tag matching is done using the LIKE syntax, so that subscribers can specify '%' as a mean to subscribe to all content.

The Publisher Service

create message type subscription_request validation = none;
create message type subscription_content validation = well_formed_xml;

create contract distribution
	(subscription_request sent by initiator,
	subscription_content sent by target);

create queue publisher;
create service publisher on queue publisher (distribution);
go

create table subscriptions (
	subscription_id int not null identity(1,1),
	tag nvarchar(50) not null,
	conversation_handle uniqueidentifier not null,
	constraint pk_subscriptions primary key (subscription_id),
	constraint unq_conversation_handle unique (conversation_handle, tag));
go

create procedure usp_publisher_handler
as
begin
	declare @mt sysname, @dh uniqueidentifier, @mb varbinary(max);
	begin try
		begin transaction;
		receive top(1)
			@mt = message_type_name,
			@dh = conversation_handle,
			@mb = message_body
			from publisher;
		if (@mt = N'subscription_request')
		begin
			insert into subscriptions (conversation_handle, tag)
                                   values (@dh, cast(@mb as nvarchar(50)));
		end
		else if(@mt = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error'
			or @mt = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog')
		begin
			delete from subscriptions
				where conversation_handle  = @dh;
			end conversation @dh;
		end
		commit
	end try
	begin catch
		declare @xact_state int = xact_state();
		if @xact_state <> 0
		begin
			rollback;
		end
	end catch
end
go

alter queue publisher with activation (
	status = on,
	max_queue_readers = 1,
	procedure_name = usp_publisher_handler,
	execute as owner);
go

The publisher service is straight forward: it uses the subscriptions table to keep track of subscribers. The activated procedure associated with the publisher service processes the subscription_request messages and adds the request dialog to the subscriptions table. The request message body is the tag the subscriber is interested in.

The Publish Content procedure

create type publish_tags_type as table (
	tag nvarchar(50) not null primary key);
go

create procedure usp_publish_content
	@content xml,
	@tags publish_tags_type readonly
as
begin
	declare @sql nvarchar(max) = N'send on conversation (';
	declare @cnt int = 0;
	declare @dh uniqueidentifier;
	declare @comma nvarchar(2) = N'';

	declare crs cursor static read_only forward_only for
		select distinct conversation_handle
		from subscriptions s
		join @tags t on t.tag like s.tag;

	open crs;
	fetch next from crs into @dh;
	while 0 = @@fetch_status
	begin

		set @sql += @comma  + N'''' + cast(@dh as nvarchar(36)) + N'''';
		set @comma = N', ';
		set @cnt += 1;
		fetch next from crs into @dh;
	end
	close crs;
	deallocate crs;

	if @cnt > 0
	begin
		set @sql+= N') message type subscription_content (@content)';
		exec sp_executesql @sql, N'@content xml', @content;
	end
end
go

The publish content procedure takes a content to be distributed and the list of tags under which the content is distributed and sends the content to all interested subscribers. One single multicast SEND is used to reach all subscribers. Dynamic SQL is used to build the multicast SEND statement.

Adding subscribers

declare @i int = 0;
declare @sql nvarchar(max);
while @i < 10
begin
	set @sql = N'create queue subscriber_' + cast(@i as nvarchar(20)) + N';
		create service subscriber_' + cast(@i as nvarchar(20)) + N'
                            on queue subscriber_'+cast(@i as nvarchar(20)) + N';';
	exec sp_executesql @sql;
	set @sql = N'declare @dh uniqueidentifier;
		begin dialog @dh
			from service subscriber_' + cast(@i as nvarchar(20)) + N'
			to service N''publisher''
			on contract distribution
			with encryption = off;
		send on conversation @dh message type subscription_request
                    (''' +case @i%5 when 0 then N'%' else nchar(@i + 65) end + ''');';
	exec sp_executesql @sql;
	set @i += 1;
end
go

This snipped adds 10 subscribers interested in tags 'B', 'C', 'D' etc. The first and fifth subscribers are interested in everything ('%'). We can see the subscribers were added to the subscriptions table by the publisher activated procedure:

select * from subscriptions

subscription_id tag                               conversation_handle
--------------- ----------------- -------------------------------------
1               %                          AFC62EF2-35B3-E011-8EED-001C25160E57
2               B                          B3C62EF2-35B3-E011-8EED-001C25160E57
3               C                          B7C62EF2-35B3-E011-8EED-001C25160E57
4               D                          BBC62EF2-35B3-E011-8EED-001C25160E57
5               E                          BFC62EF2-35B3-E011-8EED-001C25160E57
6               %                          C3C62EF2-35B3-E011-8EED-001C25160E57
7               G                          C7C62EF2-35B3-E011-8EED-001C25160E57
8               H                          CBC62EF2-35B3-E011-8EED-001C25160E57
9               I                          CFC62EF2-35B3-E011-8EED-001C25160E57
10              J                          D3C62EF2-35B3-E011-8EED-001C25160E57

A test multicast message

declare @tags publish_tags_type;
insert into @tags (tag) values ('A'), ('B'), ('C');
exec usp_publish_content N'', @tags;
go

With this one call we notified all subscribers interested, with one single multicast SEND. We can check which of the subscribers got the content:

declare @i int = 0;
declare @sql nvarchar(max) = N'', @union nvarchar(20) = N'';
while @i < 10
begin
	set @sql += @union + N'select
              ''subscriber_' + cast(@i as nvarchar(20)) + N''' as subscriber,
               count(*) as count
               from subscriber_' + cast(@i as nvarchar(20));
	set @union = ' union all ';
	set @i += 1;
end
exec sp_executesql @sql;
go

subscriber   count
------------ -----------
subscriber_0 1
subscriber_1 1
subscriber_2 1
subscriber_3 0
subscriber_4 0
subscriber_5 1
subscriber_6 0
subscriber_7 0
subscriber_8 0
subscriber_9 0

(10 row(s) affected)

We can see that subscriber_2 and subscriber_3 each got a message since the tags they're interested are 'B' and 'C' which both match a tag set by the publisher. Subscribers 1 and 5 eahc got a message because they're interested in any tag.

This pattern of publish-subscribe is not new and similar applications could be built with SQL Server Service Broker in SQL Server 2005, 2008 and 2008R2. But with SQL Server 11 the distribution is more efficient and can scale and perform better as the message bodies are not inserted and deleted multiple times, once for each subscriber, in the publisher's transmission queue.

Online non-NULL with values column add in SQL Server 2012

July 13th, 2011

Prior to SQL Server 2012 when you add a new non-NULLable column with default values to an existing table a size-of data operation occurs: every row in the table is updated to add the default value of the new column. For small tables this is insignificant, but for large tables this can be so problematic as to completely prohibit the operation. But starting with SQL Server 2012 the operation is, in most cases, instantaneous: only the table metadata is changed, no rows are being updated.

Lets look at a simple example, we’ll create a table with some rows and then add a non-NULL column with default values. First create and populate the table:

create table test (
	id int not null identity(1,1) primary key,
	someValue int not null);
go

set nocount on;
insert into test (someValue) values (rand()*1000);
go 1000

We can inspect the physical structure of the table’s records using DBCC PAGE. First lets find the page that contains the first record of the table:

select %%physloc%%, * from test where id = 1;

In my case this returned 0xD900000001000000, which means slot 0 on page 0xD9 (aka. 217) of file 1, and my test database has the DB_ID 6. Hence the parameters to DBCC PAGE

dbcc traceon (3604,-1)
dbcc page(6,1,217,3)

Page @0x0000000170D5E000

m_pageId = (1:217)                  m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 84    m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594043432960
Metadata: PartitionId = 72057594039042048                                Metadata: IndexId = 1
Metadata: ObjectId = 245575913      m_prevPage = (0:0)                  m_nextPage = (1:220)
pminlen = 12                        m_slotCnt = 476                     m_freeCnt = 4
m_freeData = 7236                   m_reservedCnt = 0                   m_lsn = (30:71:25)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 2135435720             DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 15

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000000AEBA060

0000000000000000:   10000c00 01000000 34020000 020000†††††††††††††........4......

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1
Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 564

Note the last LSN that updated the page (30:71:25) and the size of the record in slot 0 (15 bytes). Now lets add a non-NULL column with default values:

alter table test add otherValue int not null default 42 with values;

We can select from the table and see that the table was changed and the rows have value 42 for the newly added column:

select top(2) * from test;

id          someValue   otherValue
----------- ----------- -----------
1           564         42
2           387         42

Yet if we inspect again the page, we can see that is unchanged:

dbcc traceon (3604,-1)
dbcc page(6,1,217,3)

Page @0x0000000170D5E000

m_pageId = (1:217)                  m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 84    m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594043432960
Metadata: PartitionId = 72057594039042048                                Metadata: IndexId = 1
Metadata: ObjectId = 245575913      m_prevPage = (0:0)                  m_nextPage = (1:220)
pminlen = 12                        m_slotCnt = 476                     m_freeCnt = 4
m_freeData = 7236                   m_reservedCnt = 0                   m_lsn = (30:71:25)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 2135435720             DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 15
Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000000E83A060
0000000000000000:   10000c00 01000000 34020000 020000†††††††††††††........4......

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1                              

Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 564                     

Slot 0 Column 3 Offset 0x0 Length 4 Length (physical) 0
otherValue = 42 

The page header is unchanged, the last LSN is still (30:71:25), proof that the page was not modified, and the physical record is unchanged and has the same size as before. Yet DBCC shows a Column 3 and its value 42! If you pay attention you’ll notice that the Column 3 though has an Offset 0×0 and a physical length of 0. Column 3 is somehow materialized out of thin air, as it does not physically exists in the record on this page. The ‘magic’ is that the table metadata has changed and it now contains a column with a ‘default’ value:

select pc.* from sys.system_internals_partitions p
	join sys.system_internals_partition_columns pc on p.partition_id = pc.partition_id
	where p.object_id = object_id('test');

Notice that sys.system_internals_partition_columns now has two new columns that are SQL Server 2012 specific: has_default and default_value. The column we added to the test table (the third row in the image above) has a default with value 42. This is how SQL Server 2012 knows how to show a value for Column 3 for this record, even though is physically missing on the page. With this ‘magic’ in place the ALTER TABLE will no longer have to update every row in the table and the operation is fast, metadata-only, no matter the number of rows in the table. This new behavior occurs automatically, no special syntax or setting is required, the engine will simply do the right thing. There is no penalty from having a missing value in a row. The ‘missing’ value can be queried, updated, indexed, exactly as if the update during ALTER TABLE really occurred. There is no measurable performance penalty from having a default value.

What happens when we update a row? The ‘default’ value is pushed into the row, even if the column was not modified. Consider this update:

update test set someValue = 565 where id = 1;

Although we did not touch the otherValue column, the row now was modified and it contains the materialized value:

dbcc page(6,1,217,3)

...
m_freeData = 7240                   m_reservedCnt = 0                   m_lsn = (31:271:2)
...
Slot 0 Offset 0x1c35 Length 19

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 19

Memory Dump @0x000000000AB8BC35

0000000000000000:   10001000 01000000 35020000 2a000000 030000††††........5...*......

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1                              

Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 565                     

Slot 0 Column 3 Offset 0xc Length 4 Length (physical) 4
otherValue = 42         

KeyHashValue = (8194443284a0)
Slot 1 Offset 0x60 Length 15
Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000000AB8A060
0000000000000000:   10000c00 02000000 83010000 020000†††††††††††††........ƒ......

Slot 1 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 2                              

Slot 1 Column 2 Offset 0x8 Length 4 Length (physical) 4
someValue = 387                     

Slot 1 Column 3 Offset 0x0 Length 4 Length (physical) 0
otherValue = 42

Notice how the physical record has increased in size (19 bytes vs. 15), the record has the value 42 in it (the hex 2a000000) and the Column 3 now has a real offset and physical size. So the update has trully materialized the default value in the row image. I intentionally copied the output of DBCC PAGE for the next slot in the page, to show that the record with id=2 was unaffected, it continues to have a smaller size of 15 bytes and Column 3 has no physical length.

Default value vs. Default constraint

Is worth saying that the new SQL Server 2012 default column value is not the same as the default value constraint. The default value is captured when the ALTER TABLE statement is run and can never change. Only rows existing in the table at the time of running ALTER TABLE statement will have missing ‘default’ values. By contrast the default constraint can be dropped or modified and new rows inserted after the ALTER TABLE will always have a value present in row for the new column. Any REBUILD operation on the table (or on the clustered index) will materialize all the missing values as the rows are being copied from the old hobt to the new hobt. The new hobt columns (sys.system_internals_partition_columns) will loose the has_default and default_value attributes, in effect loosing any trace that this column was added online. A default constraint by contrast will be preserved as a table is rebuilt.

Restrictions

Not all data types and default values can be added online. BLOB values like varchar(max), nvarchar(max), varbinary(max) and XML cannot be added online (and frankly I see no valid data model that has a non-NULL BLOB with a default…). Types that cannot be converted to sql_variant cannot be added online, like hierarchy_id, geometry and geography or user CLR based UDTs. Default expressions that require a different value for each row, like NEWID or NEWSEQUENTIALID cannot be added online (the default expression has to be a runtime constant, not to be confused with a deterministic expression, see Conor vs. Runtime Constant Functions for more details). In the case when the newly added column increases the maximum possible row size over the 8060 bytes limit the column cannot be added online. And is an Enterprise Edition only feature. For all the cases above the behavior will revert to adding the column ‘offline’, by updating every row in the table during the ALTER TABLE statement, creating a size-of-data update. When such a situation occurs a new XEvent is fired, which contains the reason why a size-of-data update occurred: alter_table_update_data.

How to update a table with a columnstore index

July 13th, 2011

In my previous article How to use columnstore indexes in SQL Server we’ve seen how to create a columnstore index on a table and how certain queries can significantly reduce the IO needed and thus increase in performance by leveraging this new feature. But once a columnstore index is added to a table the table becomes read-only as it cannot be updated. Trying to insert a new row in the table will result in an error:

insert into sales ([date],itemid, price, quantity) values ('20110713', 1,1.0,1);

Msg 35330, Level 15, State 1, Line 1

INSERT statement failed because data cannot be updated in a table with a columnstore index. Consider disabling the columnstore index before issuing the INSERT statement, then rebuilding the columnstore index after INSERT is complete.

The error message recommends a ‘workaround’, but rebuilding the columnstore index for updates may be prohibitively expensive. For the DW and BI scenarios that columnstore indexes are targeting there is a much better solution: use table partitioning. With SQL Server 11 the limit of maximum 1000 partitions per table has been increased to 15000 partitions and with this new limit one can configure the ETL process to update every day into a new partition and still retain many many years of data. The ETL process can upload the daily data into a staging table, create a columnstore index on the staging table, then use the fast ALTER TABLE … SWITCH operation to ‘switch in’ the new data. Using the very same example as in my previous article, lets create a staging table with identical structure as the sales facts table:

create table sales_staging (
	[id] int not null identity (1000000,1),
	[date] date not null,
	itemid smallint not null,
	price money not null,
	quantity numeric(18,4) not null,
	constraint check_date check ([date] = '20110716')) on [PRIMARY];
go

create unique clustered index cdx_sales_staging_date_id
   on sales_staging ([date], [id]) on [PRIMARY];
go

Note how the staging table has a constraint check that enforces the date to be the valid date for the next partition to be switched in. Now lets populate the staging table with some more dummy sales facts:

set nocount on
go

declare @i int = 0;
begin transaction;
while @i < 250000
begin
	insert into sales_staging ([date], itemid, price, quantity)
		values ('20110716', rand()*10000, rand()*100 + 100, rand()* 10.000+1);
	set @i += 1;
	if @i % 10000 = 0
	begin
		raiserror (N'Inserted %d', 0, 1, @i);
		commit;
		begin tran;
	end
end
commit;
go

Now that our fake ETL process has finished preparing the last days sales data into a staging table, lets add a columnstore index identical with the one on the real sales table:

create columnstore index cs_sales_price_staging
          on sales_staging ([date], itemid, price, quantity);
go

OK, our staging table is complete so lets switch it in into the 'big' sales table:

alter partition scheme ps next used [PRIMARY];
alter partition function pf() split range ('20110717');
go

alter table sales_staging switch to sales partition $PARTITION.PF('20110716');
go

That's it! We've just updated our sales table with the sales fact for the last day, despite the fact that it contained a columnstore index, without disabling the columnstore index. The increased partitions count supported in SQL Server 11 combined with the fact that aligned columnstore indexes are supported for the fast partition switch operations makes the tables with column store indexes updateable in practice, if the ETL process uses a staging table and the ETL schedule matches the partitioning scheme.

How to use columnstore indexes in SQL Server

July 13th, 2011

Column oriented storage is the data storage of choice for data warehouse and business analysis applications. Column oriented storage allows for a high data compression rate and as such it can increase processing speed primarily by reducing the IO needs. Now SQL Server allows for creating column oriented indexes (called COLUMNSTORE indexes) and thus brings the benefits of this highly efficient BI oriented indexes in the same engine that runs the OLTP workload. The syntax for creating columnstore indexes is described on MSDN at CREATE COLUMNSTORE INDEX. Lets walk trough a very simple example of how to create and use a columnstore index. First lets have a dummy sales table:

create partition function pf (date) as range left for values
  ('20110712', '20110713', '20110714', '20110715', '20110716');
go

create partition scheme ps as  partition pf all to ([PRIMARY]);
go

create table sales (
	[id] int not null identity (1,1),
	[date] date not null,
	itemid smallint not null,
	price money not null,
	quantity numeric(18,4) not null)
	on ps([date]);
go

create unique clustered index cdx_sales_date_id on sales ([date], [id]) on ps([date]);
go

Notice how I created this table on a partitioning scheme that has one partition a day. See my follow up article How to update a table with a columnstore index to understand why I choose this particular arrangement. For now, lets populate the table with 1 million ‘sales’ facts:

set nocount on;
go

declare @i int = 0;
begin transaction;
while @i < 1000000
begin
	declare @date date = dateadd(day, @i /250000.00, '20110712');
	insert into sales ([date], itemid, price, quantity)
		values (@date, rand()*10000, rand()*100 + 100, rand()* 10.000+1);
	set @i += 1;
	if @i % 10000 = 0
	begin
		raiserror (N'Inserted %d', 0, 1, @i);
		commit;
		begin tran;
	end
end
commit;
go

If we look now at the structure of the sales table we see that each partition has 250k rows spread along 1089 pages:

select * from sys.system_internals_partitions p
	where p.object_id = object_id('sales');

select au.* from sys.system_internals_allocation_units au
	join sys.system_internals_partitions p
		on p.partition_id = au.container_id
	where p.object_id = object_id('sales');
go

If we now run a BI type of query like get the number of sales facts and the total sales for a day, the query would have to scan an entire partition, generating 1089 logical reads:

set statistics io on;
select count(*), sum(price*quantity) from sales where date = '20110713'
set statistics io off;
go

Table 'sales'. Scan count 1, logical reads 1089, physical reads 0,...

So lets create a columnstore index on this table:

create columnstore index cs_sales_price on sales ([date], price, quantity) on ps([date]);
go

If we look at the structure of the columnstore index we'll see that it has a much smaller footprint, only 362 pages:

select * from sys.system_internals_partitions p
	where p.object_id = object_id('sales')
	and index_id = 2;

select au.* from sys.system_internals_allocation_units au
	join sys.system_internals_partitions p
		on p.partition_id = au.container_id
	where p.object_id = object_id('sales')
		and index_id = 2;
go

Note how the columnstore index has no pages allocated for the IN_ROW_DATA allocation unit, but instead has pages allocated to the LOB_DATA allocation unit. So a columnstore index has no rows, instead it uses the BLOB storage to store the column 'segments'. Due to compression possible with column oriented storage, it needs only about one third of the pages needed by the clustered index, although it contains the same columns and the same number of sales facts. If we run again the very same query as before, we'll see how it uses the columnstore index and generates less IO:

set statistics io on;
select count(*), sum(price*quantity) from sales where date = '20110713'
set statistics io off;
go

Table 'sales'. Scan count 1, logical reads 358, physical reads 0, read-ahead reads 0, ...

This article is just a very very simplified explanation of how column store indexes can be used. Column oriented storage is one of the major features that ships with the SQL Server 11 and there is much more we could talk about it, but I only wanted to give a short introduction. You should look into column oriented storage for BI and Data Warehousing projects, where a columnstore index could speed up significantly certain type of analytic queries, specially those that use aggregate functions.

On a final note you have to understand the restrictions that columnstore indexes have, these restrictions are described in detail at the MSDN Columnstore Indexes article. The most severe restriction, by far, is the fact that a table that has columnstore indexes cannot be updates, it becomes read-only. For the specific DW and BI scenarios that columnstore indexes addresses this is actually not such a hard restriction, as the ETL process can easily circumvent this problem by using staging tables and partitioning. More on this in a next article: How to update a table with a columnstore index.