SQL Server clustered columnstore Tuple Mover

December 2nd, 2013

The updateable clustered columnstore indexes introduced with SQL Server 2014 rely on a background task called the Tuple Mover to periodically compress deltastores into the more efficient columnar format. Deltastores store data in the traditional row-mode (they are B-Trees) and as such are significantly more expensive to query that the compressed columnar segments. How more expensive? They are equivalent to storing the data in an uncompressed Heap and, due to small size (max 1048576 rows per deltastore rowset), they get little traction from parallelism and from read-aheads. It is important for your upload, initial seed and day-to-day ETL activities to achieve a healthy columnstore index, meaning no deltastores or only a few deltastores. To achieve this desired state of a healthy columnstore it is of paramount importance to understand how deltastores are created and removed.

Read the rest of this entry »

Using Hive local mode with YARN

November 19th, 2013

Most examples on how to run Hive in local mode mention using SET mapred.job.tracker=local. But with YARN this setting no longer has the desired effect, Hive queries are still executed with the M/R framework. Stepping through the Hadoop 23 shims reveals that the decision to split into local mode is driven by different configuration setting:


  @Override
  public boolean isLocalMode(Configuration conf) {
    return "local".equals(conf.get("mapreduce.framework.name"));
  }

Running Hive in localmode is important in debugging, is much easier to attach a remote to the local spawned Java process than to a real cluster, even a local one-node cluster. Thejas has a this short note about running hive in local mode, but following the advise there does not work on YARN clusters like the more recent Hortonworks sandboxes.

Services running under domain account should add dnscache service as dependency

August 27th, 2013

Recently I had to investigate a deadlocked Windows Server 2012 machine. Any attempt to start or stop a service on this machine would freeze in an infinite wait. I did not know about the “Analyze Wait Chain” feature in the Task Manager (new since Windows 7), but turns out is quite a life saver. This feature uses the Wait Chain Traversal debugging API and is, on the record, Matt Pietrek’s favourite Windows 7 feature. Simply using the Task Manager I was able to see than many programs were waiting on the “Services and Controller app” service, which is the SCM (the Service Control Manager):


Read the rest of this entry »

Understanding how SQL Server executes a query

August 1st, 2013

If you are a developer writing applications that use SQL Server and you are wondering what exactly happens when you ‘run’ a query from your application, I hope this article will help you write better database code and will help you get started when you have to investigate performance problems.

Requests

SQL Server is a client-server platform. The only way to interact with the back-end database is by sending requests that contain commands for the database. The protocol used to communicate between your application and the database is called TDS (Tabular Data Sream) and is described on MSDN in the Technical Document [MS-TDS]: Tabular Data Stream Protocol. The application can use one of the several client-side implementations of the protocol: the CLR managed SqlClient, OleDB, ODBC, JDBC, PHP Driver for SQL Server or the open source FreeTDS implementation. The gist of it is that when your application whats the database to do anything it will send a request over the TDS protocol. The request itself can take several forms:

Read the rest of this entry »

SQL Server Clustered Columnstore Indexes at TechEd 2013

June 11th, 2013

Now that the TechEd 2013 presentations are up online on Channel9 you can check out Brian Mitchell’s session What’s New for Columnstore Indexes and Batch Mode Processing. Brian does a great job at presenting the new updatable clustered columnstore indexes and the enhancements done to the vectorized query execution (aka. batch mode). Besides the TechEd presentation there is also another excellent resource available online right now for your education on the topic: the SIGMOD 2013 paper Enhancements to SQL Server Column Stores. Besides the obvious updatability, this paper cites some more improvements that are available in clustered columnstores:

It should be no surprise to anyone studying columnar storage that the updatable clustered columnstores coming with the next version of SQL Server are based on deltastores. I talked before about the SQL Server 2012 Columnstore internals and I explained why the highly compressed format that makes columnar storage so fast it also makes it basically impossible to update in-place. The technique of having a ‘deltastore’ which stores updates and, during scans, merge the updates with the columnar data is not new and is employed by several of the columnar storage solution vendors.

Read the rest of this entry »

Registry bloat after SQL Server 2012 SP1 installation

February 15th, 2013

SQL Server 2012 installation has the potential to leave an msiexec.exe installer process running after the installation finishes, as described in Windows Installer starts repeatedly after you install SQL Server 2012 SP1:

After you install SQL Server 2012 SP1 on a computer, the Windows Installer (Msiexec.exe) process is repeatedly started to repair certain assemblies. Additionally, the following events are logged in the Application log:
EventId: 1004
Source: MsiInstaller
Description: Detection of product ‘{A7037EB2-F953-4B12-B843-195F4D988DA1}’, feature ‘SQL_Tools_Ans’, Component ‘{0CECE655-2A0F-4593-AF4B-EFC31D622982}’ failed. The resource”does not exist.

EventId: 1001
Source: MsiInstaller
Description: Detection of product ‘{A7037EB2-F953-4B12-B843-195F4D988DA1}’, feature ‘SQL_Tools_Ans’ failed during request for component ‘{6E985C15-8B6D-413D-B456-4F624D9C11C2}’

When this issue occurs, you experience high CPU usage.
Cause

This issue occurs because the SQL Server 2012 components reference mismatched assemblies. This behavior causes native image generation to fail repeatedly on certain assemblies. Therefore, a repair operation is initiated on the installer package.

But the this problem has a much sinister side effect: it causes growth of the HKLM\Software registry hive. Except for the System hive, all the other registry hives are still restricted in size to a max of 2GB, see Registry Storage Space:

Views of the registry files are mapped in paged pool memory…The maximum size of a registry hive is 2 GB, except for the system hive.

Read the rest of this entry »

How to enable Selective XML indexes in SQL Server 2012 SP1

February 11th, 2013

SQL Server 2012 SP1 has shipped a great enhancement to XML: Selective XML Indexes. When properly used these indexes can speed up the searching of XML columns tremendously, at little disk/size cost:

The selective XML index feature lets you promote only certain paths from the XML documents to index. At index creation time, these paths are evaluated, and the nodes that they point to are shredded and stored inside a relational table in SQL Server. This feature uses an efficient mapping algorithm developed by Microsoft Research in collaboration with the SQL Server product team. This algorithm maps the XML nodes to a single relational table, and achieves exceptional performance while requiring only modest storage space.

Read the rest of this entry »

How to enable and disable a queue using SMO

February 6th, 2013

The SMO object model for SQL Server ServiceQueue does allow one to enable or disable a queue, but the property that modifies the queue status is not intuitive, it is IsEnqueueEnabled:

Gets or sets the Boolean property that specifies whether the queue is enabled.

This property matches the catalog view column is_enqueue_enabled in sys.service_queues but bears little resemblance to the T-SQL statement used to enable or disable a queue: ALTER QUEUE ... WITH STATUS = {ON|OFF}

Read the rest of this entry »

SQL Server backup to URL

January 25th, 2013

With the SQL Server 2012 SP1 CU2 release a new important feature was added: ability to back up and restore a database straight from Azure Blob storage:

This feature released in SQL Server 2012 SP1 CU2, enables SQL Server backup and restore directly to the Windows Azure Blob service. This feature can be used to backup SQL Server databases on an on-premises instance or an instance of SQL Server running a hosted environment such as Windows Azure Virtual Machine. Backup to cloud offers benefits such as availability, limitless geo-replicated off-site storage, and ease of migration of data to and from the cloud.

Read the rest of this entry »

Case Sensitive collation sort order

November 23rd, 2012

A recent inquiry from one of our front line CSS engineers had me look into how case sensitive collations decide the sort order. Consider a simple question like How should the values 'a 1', 'a 2', 'A 1' and 'A 2' sort?


create table [test] (
	[col] varchar(10) 
		collate Latin1_General_CS_AS);
go

insert into [test] ([col]) values 
	('a 1'),
	('a 2'),
	('A 1'),
	('A 2');
go

select [col]
from [test]
order by [col];

Here are two possible outputs:

Read the rest of this entry »