SQL Server 2014 updateable columnstores Q and A

June 17th, 2014

Are INSERTs serialized because of row group locks?

No. If a trickle INSERT requires an OPEN row group and it cannot place a lock on it it will create a new one. any number of INSERT statements can proceed in parallel, each locking its own row group. While this results in lower quality row groups (possible multiple small OPEN row groups) the decision was explicitly to favor concurrency. This applies as well to BULK INSERTs that fail to create the minimum row group size (~100k rows).

Does Tuple Mover delete the rows in deltastore?

No. Surprised? Each OPEN/CLOSED row group has its own individual deltastore. When the Tuple Mover finishes compressing a row group the entire deltastore for that row group will be deallocated. This is done to avoid the explicit logging of each individual row delete from deltastores. This is the same difference as between TRUNCATE and DELETE.

Does Tuple Mover block reads?

No. No way. Absolutely no.

Does Tuple Mover block INSERTS?

No. Both trickle inserts and bulk inserts into clustered columnstores can proceed while the Tuple Mover is compressing a rowgroup. You only need to think about the fact that the Tuple Mover only has business with CLOSED row groups and INSERTs are only concerned with OPEN row groups. There is no overlap so there is no reason for blocking.

Does Tuple Mover block UPDATE, DELETE, MERGE?

Yes. Spooled scans for update (or delete) cannot re-acquire the row if the row storage changes between the scan and the update. So the Tuple Mover is mutually exclusive with any UPDATE, DELETE or MERGE statement on the columnstore. This exclusion is achieved by a special object level lock that is acquired by UPDATE/DELETE/MERGE in shared mode and by the Tuple Mover in exclusive mode.

Can Tuple Mover, or REORGANIZE, shift rows between row groups?

No. The Tuple Mover (and REORGANIZE) can only compress a row group but it cannot shift rows between row groups. Particularly it cannot ‘stich’ several small deltastores into one compressed row group. REBUILD may appear that it can shift or move rows, but REBUILD is doing exactly what the name implies: a full rebuild. It reads the existing columnstore and builds a new one. The organization (row group numbers and size) of the new (rebuilt) columnstore has basically no relation with the original organization of the old columnstore.

SQL Server clustered columnstore Tuple Mover

December 2nd, 2013

The updateable clustered columnstore indexes introduced with SQL Server 2014 rely on a background task called the Tuple Mover to periodically compress deltastores into the more efficient columnar format. Deltastores store data in the traditional row-mode (they are B-Trees) and as such are significantly more expensive to query that the compressed columnar segments. How more expensive? They are equivalent to storing the data in an uncompressed Heap and, due to small size (max 1048576 rows per deltastore rowset), they get little traction from parallelism and from read-aheads. It is important for your upload, initial seed and day-to-day ETL activities to achieve a healthy columnstore index, meaning no deltastores or only a few deltastores. To achieve this desired state of a healthy columnstore it is of paramount importance to understand how deltastores are created and removed.

Read the rest of this entry »

SQL Server Clustered Columnstore Indexes at TechEd 2013

June 11th, 2013

Now that the TechEd 2013 presentations are up online on Channel9 you can check out Brian Mitchell’s session What’s New for Columnstore Indexes and Batch Mode Processing. Brian does a great job at presenting the new updatable clustered columnstore indexes and the enhancements done to the vectorized query execution (aka. batch mode). Besides the TechEd presentation there is also another excellent resource available online right now for your education on the topic: the SIGMOD 2013 paper Enhancements to SQL Server Column Stores. Besides the obvious updatability, this paper cites some more improvements that are available in clustered columnstores:

It should be no surprise to anyone studying columnar storage that the updatable clustered columnstores coming with the next version of SQL Server are based on deltastores. I talked before about the SQL Server 2012 Columnstore internals and I explained why the highly compressed format that makes columnar storage so fast it also makes it basically impossible to update in-place. The technique of having a ‘deltastore’ which stores updates and, during scans, merge the updates with the columnar data is not new and is employed by several of the columnar storage solution vendors.

Read the rest of this entry »

Inside the SQL Server 2012 Columnstore Indexes

May 29th, 2012

Columnar storage has established itself as the de-facto option for Business Intelligence (BI) storage. The traditional row-oriented storage of RDBMS was designed for fast single-row oriented OLTP workloads and it has problems handling the large volume range oriented analytical processing that characterizes BI workloads. But what is columnar storage and, more specifically, how does SQL Server 2012 implement columnar storage with the new COLUMNSTORE indexes?

Read the rest of this entry »