SqlDependency based caching of LINQ Queries

August 4th, 2010

Query Notifications is the SQL Server feature that allows a client to subscribe to notifications that are sent when data in the database changes irrelevant of how that change occurs. I have talked before about how Query Notifications works in the article The Mysterious Notification. This feature was designed specifically for client side cache invalidation: the applications runs a query, gets back the result and stores it in the cache. Whenever the result is changed because data was updated, the application will be notified and it can invalidate the cached result.

Leveraging Query Notifications from the managed clients is very easy due to the dedicated SqlDependency class that takes care of a lot of the details needed to be set up in place in order to be able to receive these notifications. But the MSDN examples and the general community know how with SqlDepenendency is geared toward straight forward usage, by attaching it to a SqlCommand object.

Leveraging SqlDependency from LINQ queries

There is no clear guidance from MSDN on how to mix these two technologies: Query Notifications and LINQ. There are a few in the community who have given hints on what has to be done, like this article Using SQLDependency objects with LINQ by Ryan Dunn (blog|twitter).

My goal is to propose an easy to use extension method that can add SqlDependency based caching to any IQueryable<T>. Usage should be as simple as:

var queryTags = from t in ctx.Tags select t;
var tags = queryTags.AsCached("Tags");
foreach (Tag t in tags)
{
  ...
}

The first invocation should run the query and return the result, setting up a SqlDependency notification and also caching the result. Subsequent invocations should return the cached result, without hitting the database. Any change to the Tags table in my example should trigger the SqlDependency and invalidate the cache. Next invocation would again run the query and return the updated result, setting up a new SqlDependency notification and caching the new result.

LinqToCache project

My solution is available as the LinqToCache project. To cache a LINQ query results and get active SqlDependency notifications when the data was changed, simply download the appropriate DLL for your target framework (.Net 3.5 or .Net 4.0) and add it as a reference to your project. Now any LINQ query (any IQueryable) will have a new extension method AsCached. This method returns an IEnumerable of the query result. First invocation will always hit the database and set up a SqlDependency, subsequent invocations will return the cached result as long as it was not invalidated.

Query Notifications restrictions

Not every query can be subscribed for notifications. The gory details of what works and what doesn’t are described in MSDN at Creating a Query for Notification:

  • The projected columns in the SELECT statement must be explicitly stated, and table names must be qualified with two-part names. Notice that this means that all tables referenced in the statement must be in the same database.
  • The statement may not use the asterisk (*) or table_name.* syntax to specify columns.
  • The statement may not use unnamed columns or duplicate column names.
  • The statement must reference a base table.
  • The statement must not reference tables with computed columns.
  • The projected columns in the SELECT statement may not contain aggregate expressions unless the statement uses a GROUP BY expression. When a GROUP BY expression is provided, the select list may contain the aggregate functions COUNT_BIG() or SUM(). However, SUM() may not be specified for a nullable column. The statement may not specify HAVING, CUBE, or ROLLUP.
  • A projected column in the SELECT statement that is used as a simple expression must not appear more than once.
  • The statement must not include PIVOT or UNPIVOT operators.
  • The statement must not include the UNION, INTERSECT, or EXCEPT operators.
  • The statement must not reference a view.
  • The statement must not contain any of the following: DISTINCT, COMPUTE or COMPUTE BY, or INTO.
  • The statement must not reference server global variables (@@variable_name).
  • The statement must not reference derived tables, temporary tables, or table variables.
  • The statement must not reference tables or views from other databases or servers.
  • The statement must not contain subqueries, outer joins, or self-joins.
  • The statement must not reference the large object types: text, ntext, and image.
  • The statement must not use the CONTAINS or FREETEXT full-text predicates.
  • The statement must not use rowset functions, including OPENROWSET and OPENQUERY.
  • The statement must not use any of the following aggregate functions: AVG, COUNT(*), MAX, MIN, STDEV, STDEVP, VAR, or VARP.
  • The statement must not use any nondeterministic functions, including ranking and windowing functions.
  • The statement must not contain user-defined aggregates.
  • The statement must not reference system tables or views, including catalog views and dynamic management views.
  • The statement must not include FOR BROWSE information.
  • The statement must not reference a queue.
  • The statement must not contain conditional statements that cannot change and cannot return results (for example, WHERE 1=0).
  • The statement can not specify READPAST locking hint.
  • The statement must not reference any Service Broker QUEUE.
  • The statement must not reference synonyms.
  • The statement must not have comparison or expression based on double/real data types.
  • The statement must not use the TOP expression.

Although this list of restrictions is pretty severe, there is still room left for plenty of useful queries than can be cached using SqlDependency notifications for invalidation.

Linq to SQL

Straight forward LINQ to SQL queries are valid for Query Notifications, as long as the first restriction listed above is cleared: table names must be qualified with two-part names. In practice, this means simply fully qualifying the table names in the context designer, or in the [Table] attribute on the class. that is, always use ‘dbo.Table’ instead of simply ‘Table’ (of course, replace ‘dbo’ with appropriate schema if necessary).

But there are a couple of conditions that are specially important for us: must not use the TOP expression and must not use … ranking and windowing functions.. These two restrictions mean the popular Skip() and Take() operators are not supported. Unfortunately, these are some of the most popular operators used with LINQ because they are the easiest way to implement paging of results.

LINQ to Entity Framework

My initial goal was to only support LINQ to SQL, given that the overwhelming majority of developers favor it over EF. But the implementation works with any IQueryable, so in theory it should just work with EF as well. Unfortunately, the way EF chooses to formulate the queries makes it incompatible with Query Notifications. Consider a simple Linq TO EF query like following:

var q = from p in ctx.Persons where p.FirstName == "Remus" select p;

This will generate the following SQL:

SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[LastName] AS [LastName]
FROM (SELECT
      [Persons].[PersonId] AS [PersonId],
      [Persons].[FirstName] AS [FirstName],
      [Persons].[LastName] AS [LastName]
      FROM [dbo].[Persons] AS [Persons]) AS [Extent1]
WHERE 'Remus' = [Extent1].[FirstName]

The gratuitous addition of a subquery violates the Query Notifications restrictions and the SqlDependency gets invalidated straight away with a Statement violation.

Download

The LinqToCache DLLs and source code are available at http://code.google.com/p/linqtocache/.

Remote Desktop Manager now available

June 29th, 2010

Microsoft internal folks have been used for years a little tool called the Remote Desktop Manager. This tool allows you to save connection settings for frequently used machines you remote into. You can group the servers, save the credentials used for each server or for each group. It allows you to tile and monitor multiple remote desks at once, and overall is a wonderful tool for anyone using remote desktops frequently.

Fortunately now the tool is available publicly for download from the Microsoft Download site.

High Volume Contiguos Real Time Audit and ETL

June 11th, 2010

Tomorrow at SQL Saturday #43 in Redmond I’ll be presenting a session documenting the real-time audit and ETL the Microsoft uses to manage the access policies on its network. This presentation goes over some of the challenges posed by a contiguos, non-stop, high trhoughput stream of audit records poses when loading the audit into a data warehouse. I’m posting here the slides I’m going to present, if you plan to attent you can preview them right now, and come to the session with an educated set of questions for your specific case.

The puzzle of U locks in deadlock graphs

May 12th, 2010

In a stackoverflow.com question the user has asked how come a SELECT statement could own a U mode lock?

S-U-X deadlock graph

S-U-X deadlock graph

The deadlock indeed suggests that the deadlock victim, a SELECT statement, is owning an U lock on the PK_B index. Why would a SELECT own an U lock? The query had no table hints and was a standalone query, not part of a multi-statement transaction that could had aquired the U lock in previous staements.

Turns out that the SELECT was actually not owning any U lock. The deadlock graph files (the *.xdl files) are in fact XML files and they can be opened as XML and inspected, for a little more detail than the visual deadlock graph visualizer permits. Here is the actual resource list in the deadlock XML:

<resource-list>
   <keylock hobtid="72057594052411392" dbid="10"
         objectname="A" indexname="PK_A" id="lock17ed4040"
        mode="X" associatedObjectId="72057594052411392">
    <owner-list>
     <owner id="process4f5d000" mode="X"/>
    </owner-list>
    <waiter-list>
     <waiter id="processfa3c8e0" mode="S" requestType="wait"/>
    </waiter-list>
   </keylock>
   <keylock hobtid="72057594051166208" dbid="10"
        objectname="B" indexname="PK_B" id="lock22ea3940"
        mode="U" associatedObjectId="72057594051166208">
    <owner-list>
     <owner id="processfa3c8e0" mode="S"/>
    </owner-list>
    <waiter-list>
     <waiter id="process4f5d000" mode="X" requestType="convert"/>
    </waiter-list>
   </keylock>
  </resource-list>

As you can see, the resource lock22ea3940 is owned by the process processfa3c8e0 (the SELECT) indeed, but is owned in S mode. The process process4f5d000 (the UPDATE) is requesting this resource for a convert from U to X mode. So the true deadlock is like this:

  • SELECT owns a lock on the row in PK_B in S mode
  • SELECT wants a lock on the row in PK_A in S mode
  • UPDATE owns a lock on the row in PK_A in X mode
  • UPDATE also owns a U lock on the PK_B row. (S and U modes are compatible)
  • UPDATE is requesting a convert of the U lock it has on the row on PK_B to X mode

As you can see, there is no mysterious U lock owned by the SELECT. There is an U lock on the row in PK_B, but is owned by the UPDATE, which is requesting a convert to X for it. The fact that the resource is showned in the deadlock graph viewer in SSMS as being ‘Owner mode: U’ and pointing to the SELECT is simply an artifact of how SSMS displays the deadlock graph.

The lesson to take home is that the visual graphic deadlock graph display is usefull only to have a cursory glance at the deadlock cycle. The true meat and potatoes are in the XML, which has a lot more information. Not to mention that the information in the XML is actually correct, which helps investigation…

Effective Speakers at Portland #devsat and #sqlsat27

May 11th, 2010

Which is faster: ++i or i++?

In 1998 I was looking to change jobs and I interviewed with a company for a C++ developer position. The interview went well and as we were approaching the end, one of the interviewers asked me this question: which is faster ++i or i++? I pondered the question a second, then the other interviewer said that is probably implementation specific. The first one corrected him that i++ must return the value before the increment therefore it must make a copy of itself, while ++i returns the value after the increment therefore does not need to make a copy of itself, it can return itself. With this my chance to actually answer the question and their chance to see how I approach the problem were gone, but the interview was finished anyway as we were out of time. I got the offer from them, yet I ended up with a different company. But that question lingered in my mind, I though what a clever little thing to know. Few months later I got my hands on the Effective C++ book by Scott Meyers, and this opened my appetite for the follow up book More Effective C++. And there it was, item 6 in More Effective C++: Distinguish between prefix and postfix forms of forms of increment and decrement operators.

These two books were tremendously important in forming me as a professional C++ developer. They got me starting in studying C++ more deeply, beyond what I had to use in my day to day job. I ended up taking a Brainbench C++ test and I scored in the top 10 worldwide, which pretty soon landed me an email from Microsoft recruiting. The rest, as they say, is history.

SQL Saturday #27 is going to be held on May 22 in Portland and will share the venue with Portland CodeCamp. The list of speakers is really impressive, and amongst them, you guessed, is Scott Meyers presenting CPU Caches and Why You Care. There are many more fine speakers and interesting topics for every taste, and the event is free. Is worth your time if you’re in the area, and well worth a trip to the City of Roses if you’re not.

I myself will be presenting a session on High Volume Real Time Contiguous ETL and Audit.

To register with the CodeCamp and SQL Saturday events go to http://devsat.eventbrite.com