Plagiarizing Forum answers

September 28th, 2010

Today I learned about anew form of plagiarism: forum answers without proper attribution. There is an user on MSDN that cowers under the moniker of Learning_SQL. What the weasel does it trolls the MSDN forums for questions, takes the question verbatim and posts it on StackOverflow, then takes the answers from StackOverflow and posts them back as his own answer on MSDN.

See http://stackoverflow.com/questions/3816683/index-to-speed-up-delete and http://social.msdn.microsoft.com/Forums/en/transactsql/thread/00667b94-1fb5-4238-b169-c596a2ae25ed.

Or http://stackoverflow.com/questions/3816023/sys-dm-fts-parser-permission and http://social.msdn.microsoft.com/Forums/en/transactsql/thread/a3ec4602-e0be-48f5-92ad-2eb25a6befc5.

http://stackoverflow.com/questions/3787986/need-an-idea-for-doing-bulk-compare-and-insert and http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/ed85fb0b-55db-474d-a45c-d715d659b0a8/#6157a812-82f8-473a-a689-d4217e1fbc13

http://stackoverflow.com/questions/3815502/stored-procedure-when-to-use-output-parameter-vs-return-variable/3815545#3815545 and http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/6013bdf2-1452-4495-865a-1bfcc8b361d0/#5f17fd95-16d0-40eb-b587-db35e44bdb69

This is not some accidental one copy of a answer, is a consitent behavior done with the sole purpose of acumulating MSDN reputation. Moral consideration aside, Learning_SQL is actually violating the StackOverflow attribution required content license:

  1. Visually indicate that the content is from Stack Overflow, Meta Stack Overflow, Server Fault, or Super User in some way. It doesn’t have to be obnoxious; a discreet text blurb is fine.
  2. Hyperlink directly to the original question on the source site (e.g., http://stackoverflow.com/questions/12345)
  3. Show the author names for every question and answer
  4. Hyperlink each author name directly back to their user profile page on the source site (e.g., http://stackoverflow.com/users/12345/username)

Incidents of plagiarization occur frequently on blogs, but with such a small comunity as the SQL Server one, where pretty much everybody knows everybody, the perpetrator doesn’t get too far. Brent Ozar (blog|Twitter) covered this subject on several articles:

But I guess imagination has no bounds, and now some new players (or same old ones?) are trying new grounds…

SqlDependency based caching of LINQ Queries

August 4th, 2010

Query Notifications is the SQL Server feature that allows a client to subscribe to notifications that are sent when data in the database changes irrelevant of how that change occurs. I have talked before about how Query Notifications works in the article The Mysterious Notification. This feature was designed specifically for client side cache invalidation: the applications runs a query, gets back the result and stores it in the cache. Whenever the result is changed because data was updated, the application will be notified and it can invalidate the cached result.

Leveraging Query Notifications from the managed clients is very easy due to the dedicated SqlDependency class that takes care of a lot of the details needed to be set up in place in order to be able to receive these notifications. But the MSDN examples and the general community know how with SqlDepenendency is geared toward straight forward usage, by attaching it to a SqlCommand object.

Leveraging SqlDependency from LINQ queries

There is no clear guidance from MSDN on how to mix these two technologies: Query Notifications and LINQ. There are a few in the community who have given hints on what has to be done, like this article Using SQLDependency objects with LINQ by Ryan Dunn (blog|twitter).

My goal is to propose an easy to use extension method that can add SqlDependency based caching to any IQueryable<T>. Usage should be as simple as:

var queryTags = from t in ctx.Tags select t;
var tags = queryTags.AsCached("Tags");
foreach (Tag t in tags)
{
  ...
}

The first invocation should run the query and return the result, setting up a SqlDependency notification and also caching the result. Subsequent invocations should return the cached result, without hitting the database. Any change to the Tags table in my example should trigger the SqlDependency and invalidate the cache. Next invocation would again run the query and return the updated result, setting up a new SqlDependency notification and caching the new result.

LinqToCache project

My solution is available as the LinqToCache project. To cache a LINQ query results and get active SqlDependency notifications when the data was changed, simply download the appropriate DLL for your target framework (.Net 3.5 or .Net 4.0) and add it as a reference to your project. Now any LINQ query (any IQueryable) will have a new extension method AsCached. This method returns an IEnumerable of the query result. First invocation will always hit the database and set up a SqlDependency, subsequent invocations will return the cached result as long as it was not invalidated.

Query Notifications restrictions

Not every query can be subscribed for notifications. The gory details of what works and what doesn’t are described in MSDN at Creating a Query for Notification:

  • The projected columns in the SELECT statement must be explicitly stated, and table names must be qualified with two-part names. Notice that this means that all tables referenced in the statement must be in the same database.
  • The statement may not use the asterisk (*) or table_name.* syntax to specify columns.
  • The statement may not use unnamed columns or duplicate column names.
  • The statement must reference a base table.
  • The statement must not reference tables with computed columns.
  • The projected columns in the SELECT statement may not contain aggregate expressions unless the statement uses a GROUP BY expression. When a GROUP BY expression is provided, the select list may contain the aggregate functions COUNT_BIG() or SUM(). However, SUM() may not be specified for a nullable column. The statement may not specify HAVING, CUBE, or ROLLUP.
  • A projected column in the SELECT statement that is used as a simple expression must not appear more than once.
  • The statement must not include PIVOT or UNPIVOT operators.
  • The statement must not include the UNION, INTERSECT, or EXCEPT operators.
  • The statement must not reference a view.
  • The statement must not contain any of the following: DISTINCT, COMPUTE or COMPUTE BY, or INTO.
  • The statement must not reference server global variables (@@variable_name).
  • The statement must not reference derived tables, temporary tables, or table variables.
  • The statement must not reference tables or views from other databases or servers.
  • The statement must not contain subqueries, outer joins, or self-joins.
  • The statement must not reference the large object types: text, ntext, and image.
  • The statement must not use the CONTAINS or FREETEXT full-text predicates.
  • The statement must not use rowset functions, including OPENROWSET and OPENQUERY.
  • The statement must not use any of the following aggregate functions: AVG, COUNT(*), MAX, MIN, STDEV, STDEVP, VAR, or VARP.
  • The statement must not use any nondeterministic functions, including ranking and windowing functions.
  • The statement must not contain user-defined aggregates.
  • The statement must not reference system tables or views, including catalog views and dynamic management views.
  • The statement must not include FOR BROWSE information.
  • The statement must not reference a queue.
  • The statement must not contain conditional statements that cannot change and cannot return results (for example, WHERE 1=0).
  • The statement can not specify READPAST locking hint.
  • The statement must not reference any Service Broker QUEUE.
  • The statement must not reference synonyms.
  • The statement must not have comparison or expression based on double/real data types.
  • The statement must not use the TOP expression.

Although this list of restrictions is pretty severe, there is still room left for plenty of useful queries than can be cached using SqlDependency notifications for invalidation.

Linq to SQL

Straight forward LINQ to SQL queries are valid for Query Notifications, as long as the first restriction listed above is cleared: table names must be qualified with two-part names. In practice, this means simply fully qualifying the table names in the context designer, or in the [Table] attribute on the class. that is, always use ‘dbo.Table’ instead of simply ‘Table’ (of course, replace ‘dbo’ with appropriate schema if necessary).

But there are a couple of conditions that are specially important for us: must not use the TOP expression and must not use … ranking and windowing functions.. These two restrictions mean the popular Skip() and Take() operators are not supported. Unfortunately, these are some of the most popular operators used with LINQ because they are the easiest way to implement paging of results.

LINQ to Entity Framework

My initial goal was to only support LINQ to SQL, given that the overwhelming majority of developers favor it over EF. But the implementation works with any IQueryable, so in theory it should just work with EF as well. Unfortunately, the way EF chooses to formulate the queries makes it incompatible with Query Notifications. Consider a simple Linq TO EF query like following:

var q = from p in ctx.Persons where p.FirstName == "Remus" select p;

This will generate the following SQL:

SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[LastName] AS [LastName]
FROM (SELECT
      [Persons].[PersonId] AS [PersonId],
      [Persons].[FirstName] AS [FirstName],
      [Persons].[LastName] AS [LastName]
      FROM [dbo].[Persons] AS [Persons]) AS [Extent1]
WHERE 'Remus' = [Extent1].[FirstName]

The gratuitous addition of a subquery violates the Query Notifications restrictions and the SqlDependency gets invalidated straight away with a Statement violation.

Download

The LinqToCache DLLs and source code are available at http://code.google.com/p/linqtocache/.

Remote Desktop Manager now available

June 29th, 2010

Microsoft internal folks have been used for years a little tool called the Remote Desktop Manager. This tool allows you to save connection settings for frequently used machines you remote into. You can group the servers, save the credentials used for each server or for each group. It allows you to tile and monitor multiple remote desks at once, and overall is a wonderful tool for anyone using remote desktops frequently.

Fortunately now the tool is available publicly for download from the Microsoft Download site.

High Volume Contiguos Real Time Audit and ETL

June 11th, 2010

Tomorrow at SQL Saturday #43 in Redmond I’ll be presenting a session documenting the real-time audit and ETL the Microsoft uses to manage the access policies on its network. This presentation goes over some of the challenges posed by a contiguos, non-stop, high trhoughput stream of audit records poses when loading the audit into a data warehouse. I’m posting here the slides I’m going to present, if you plan to attent you can preview them right now, and come to the session with an educated set of questions for your specific case.

The puzzle of U locks in deadlock graphs

May 12th, 2010

In a stackoverflow.com question the user has asked how come a SELECT statement could own a U mode lock?

S-U-X deadlock graph

S-U-X deadlock graph

The deadlock indeed suggests that the deadlock victim, a SELECT statement, is owning an U lock on the PK_B index. Why would a SELECT own an U lock? The query had no table hints and was a standalone query, not part of a multi-statement transaction that could had aquired the U lock in previous staements.

Turns out that the SELECT was actually not owning any U lock. The deadlock graph files (the *.xdl files) are in fact XML files and they can be opened as XML and inspected, for a little more detail than the visual deadlock graph visualizer permits. Here is the actual resource list in the deadlock XML:

<resource-list>
   <keylock hobtid="72057594052411392" dbid="10"
         objectname="A" indexname="PK_A" id="lock17ed4040"
        mode="X" associatedObjectId="72057594052411392">
    <owner-list>
     <owner id="process4f5d000" mode="X"/>
    </owner-list>
    <waiter-list>
     <waiter id="processfa3c8e0" mode="S" requestType="wait"/>
    </waiter-list>
   </keylock>
   <keylock hobtid="72057594051166208" dbid="10"
        objectname="B" indexname="PK_B" id="lock22ea3940"
        mode="U" associatedObjectId="72057594051166208">
    <owner-list>
     <owner id="processfa3c8e0" mode="S"/>
    </owner-list>
    <waiter-list>
     <waiter id="process4f5d000" mode="X" requestType="convert"/>
    </waiter-list>
   </keylock>
  </resource-list>

As you can see, the resource lock22ea3940 is owned by the process processfa3c8e0 (the SELECT) indeed, but is owned in S mode. The process process4f5d000 (the UPDATE) is requesting this resource for a convert from U to X mode. So the true deadlock is like this:

  • SELECT owns a lock on the row in PK_B in S mode
  • SELECT wants a lock on the row in PK_A in S mode
  • UPDATE owns a lock on the row in PK_A in X mode
  • UPDATE also owns a U lock on the PK_B row. (S and U modes are compatible)
  • UPDATE is requesting a convert of the U lock it has on the row on PK_B to X mode

As you can see, there is no mysterious U lock owned by the SELECT. There is an U lock on the row in PK_B, but is owned by the UPDATE, which is requesting a convert to X for it. The fact that the resource is showned in the deadlock graph viewer in SSMS as being ‘Owner mode: U’ and pointing to the SELECT is simply an artifact of how SSMS displays the deadlock graph.

The lesson to take home is that the visual graphic deadlock graph display is usefull only to have a cursory glance at the deadlock cycle. The true meat and potatoes are in the XML, which has a lot more information. Not to mention that the information in the XML is actually correct, which helps investigation…

Effective Speakers at Portland #devsat and #sqlsat27

May 11th, 2010

Which is faster: ++i or i++?

In 1998 I was looking to change jobs and I interviewed with a company for a C++ developer position. The interview went well and as we were approaching the end, one of the interviewers asked me this question: which is faster ++i or i++? I pondered the question a second, then the other interviewer said that is probably implementation specific. The first one corrected him that i++ must return the value before the increment therefore it must make a copy of itself, while ++i returns the value after the increment therefore does not need to make a copy of itself, it can return itself. With this my chance to actually answer the question and their chance to see how I approach the problem were gone, but the interview was finished anyway as we were out of time. I got the offer from them, yet I ended up with a different company. But that question lingered in my mind, I though what a clever little thing to know. Few months later I got my hands on the Effective C++ book by Scott Meyers, and this opened my appetite for the follow up book More Effective C++. And there it was, item 6 in More Effective C++: Distinguish between prefix and postfix forms of forms of increment and decrement operators.

These two books were tremendously important in forming me as a professional C++ developer. They got me starting in studying C++ more deeply, beyond what I had to use in my day to day job. I ended up taking a Brainbench C++ test and I scored in the top 10 worldwide, which pretty soon landed me an email from Microsoft recruiting. The rest, as they say, is history.

SQL Saturday #27 is going to be held on May 22 in Portland and will share the venue with Portland CodeCamp. The list of speakers is really impressive, and amongst them, you guessed, is Scott Meyers presenting CPU Caches and Why You Care. There are many more fine speakers and interesting topics for every taste, and the event is free. Is worth your time if you’re in the area, and well worth a trip to the City of Roses if you’re not.

I myself will be presenting a session on High Volume Real Time Contiguous ETL and Audit.

To register with the CodeCamp and SQL Saturday events go to http://devsat.eventbrite.com

SQL Server 2008 R2 Express database size limit: 10GB

April 28th, 2010

The SQL Server 2008 R2 Express editions has increased the database size limit to 10Gb from the previous limit of 4Gb. This is great news for many developers, as the 4Gb limitation was by far the most difficult barrier preventing Express adoption. With today’s rate of generating data, the 4Gb limit was just plain small.

All the other limitations of SQL Server Express stay in place:

CPU
SQL Server Express only uses once CPU socket. It will use all cores and any Hyper-Threading logical processor in that socket though.
Memory
SQL Server Express limits the size of the data buffer pool to 1Gb.
Replication
SQL Server Express can only participate as a subscriber in a replication topology.
Service Broker
Two SQL Server Express instances cannot exchange Service Broker messages directly, the messages have to be routed through a higher level SKU.
SQL Agent
SQL Server Express does not have an Agent service and as such it cannot run Agent scheduled jobs.

How to change database mirroring encryption with minimal downtime

April 23rd, 2010

SQL Server Database Mirroring uses an encrypted connection to ship the data between the principal and the mirror. By default RC4 encryption is enabled and used. The endpoint can be configured to use AES encryption instead, or no encryption. The overhead of using RC4 encryption is quite small, the overhead of using AES encryption is slightly bigger, but not significant. Under well understood conditions, like inside a secured data center, encryption can be safely turned off for a 5-10% increase in speed in mirroring traffic. Note that even with encryption turned off, the traffic is still cryptographically signed (HMAC). Traffic signing cannot be turned off.

To change the encryption used by an endpoint, one has to run the ALTER ENDPOINT … FOR DATABASE_MIRRORING (ENCRYPTION = {DISABLED|SUPPORTED|REQUIRED}). Two endpoints must have compatible encryption settings to be able to communicate. The following table shows the compatibility matrix of encryption settings:

DISABLED SUPPORTED REQUIRED
DISABLED CLEAR CLEAR -
SUPPORTED CLEAR ENCRYPTED ENCRYPTED
REQUIRED - ENCRYPTED ENCRYPTED

The default setting for an endpoint is ENCRYPTION = REQUIRED, which enforces encryption and refuses to connect to an endpoint that has disabled encryption.

Changing encryption settings on an existing endpoint

If you have a running mirroring session and want to change the settings to squeeze the extra 5-10% you can expect from removing RC4 encryption, then chances are you deployed the endpoint with the default encryption settings, namely REQUIRED. If you don’t know the current endpoint settings you can always check the sys.database_mirroring_endpoints metadata catalog. When encryption is REQUIRED the encryption_algorithm column is one of 1,2,5 or 6. When encryption is SUPPORTED the encryption_algorithm column is one of 3,4,7 or 8. When is DISABLED the encryption_algorithm is 0 and the is_encryption_Enabled column changes to 0. To force the traffic to be unencrypted at least one endpoint has to have ENCRYPTION = DISABLED.

When you run the ALTER ENDPOINT statement and change the encryption settings the endpoint is going to be stopped and restarted during the ALTER statement. All existing connections will be disconnected. A database mirroring session may immediately re-connect and not react to this short disruption in any fashion visible to the user.

The safest way to change an existing mirroring session that uses encryption to no longer encrypt the traffic, when there is no witness, would be like this:

  1. Change the mirror endpoint to SUPPORTED
  2. Change the principal endpoint to DISABLED
  3. Change the mirror endpoint to DISABLED
  4. Verify that the connections are unencrypted, check encryption_algorithm column in the sys.dm_db_mirroring_connections DMV.

If a the mirroring session involves a witness, then it too must have the endpoint set to a compatible encryption setting:

  1. Change the witness endpoint to SUPPORTED
  2. Change the mirror endpoint to SUPPORTED
  3. Change the principal endpoint to DISABLED
  4. Change the witness endpoint to DISABLED
  5. Change the mirror endpoint to DISABLED
  6. Verify that the connections are unencrypted, check encryption_algorithm column in the sys.dm_db_mirroring_connections DMV.

Note that if automatic failover is enabled then at the moment the principal endpoint is changed, it is possible for automatic failover to occur, given that for a brief moment the mirror and the witness will have a quorum.

How to troubleshoot if something goes wrong

Attach Profiler to all instances involved in the mirroring session and open a trace that listens for the Audit Database Mirroring Login Event Class (on SQL 2005 use the Audit Broker Login Event Class event instead, it will trace the DBM sessions). If you did a mistake during the ALTER ENDPOINT changes and ended up with incompatible settings, there will be an event generated visible in Profiler. The event Text will contain an error message explaining why the endpoints cannot connect.

What is Remus up to?

April 1st, 2010

Some of you already know this: I am again FTE with Microsoft. I was in a contract for a very interesting project with Microsoft IT over the past months in a vendor position, and getting back in touch with the cool stuff that goes on inside Microsoft kindled back the passion for bold projects with big impact. Since March 29 I’m working again as a developer with what is officially known as the SQL RDBMS Core Team. I’m no longer involved with the Service Broker though, I now work on the Access Methods. Indexes, BTrees, Heaps and lets not forget DBCC. Fun stuff.

The Bizzaro Guide to SQL Server Performance

March 31st, 2010

Some say performance troubleshooting is a difficult science that blends just the right amount of patience, knowledge and experience. But I say forget all that, a few bullet points can get you a long way in fixing any problem you encounter. Is more important to find a google SEO friendly result that gives simplistic advice. Most importantly, good advice never contains the words ‘It depends’. Without further ado, here is my bulletproof SQL Server optimization guide:

  • Always trust your gut feeling. Avoid doing costly and unnecessary measurements. They may lead down the treacherous path of the scientific method. A gut feeling is always easier to explain and this improves communication. Measurements require use of complicated notions not everybody understands, so they lead to conflicts in the team.
  • High CPU utilization is caused by index fragmentation. Because the distance between database pages increases, the processor needs more cycles to reference the pages in the buffer pool.
  • Low CPU utilization is caused by index fragmentation. As the index fragments get smaller, they fit better into the processor L2 cache and this results in fewer cycles needed to access the row slots in the page. Because the data is in the cache the processor idles next cycles, resulting in low CPU utilization.
  • High Avg. Disk Sec. per Transfer is caused by index fragmentation. When indexes are fragmented the disk controller has to reorder the IO scatter-gather requests to put them in descending order. Needles to say, this operation increases the transfer times in geometric progression, because all the commercial disk controllers use bubble sort for this operation.
  • High memory consumption is caused by index fragmentation. This is fairly trivial and well known, but I’ll repeat it here: as the number of index fragments increases more pointers are needed to keep track of each fragment. Pointers are stored in virtual memory and virtual memory is very large, and this causes high memory consumption.
  • Syntax errors are caused by index fragmentation. Because the syntax is verified using the metadata catalogs, high fragmentation in the database can leave gaps in the syntax. This is turn causes the parser to generate syntax errors on perfectly valid statements like SECLET and UPTADE.
  • Covering indexes can lead to index fragmentation. Covering indexes are the indexes used by the query optimizer to cover itself in case the plan has execution faults. Because they are so often read they wear off and start to fragment.
  • Index fragmentation can be resolved by shrinking the database. As the data pages are squeezed tighter during the shrinking, they naturally realign themselves in the correct order.

There you have it, the simplest troubleshooting guide. Since most performance problems are caused by index fragmentation, all you have to do is shrink the database to force the pages to re-align correctly, and this will resolve the performance problem.

Happy April 1st everyone!