Plagiarizing Forum answers

September 28th, 2010

Today I learned about anew form of plagiarism: forum answers without proper attribution. There is an user on MSDN that cowers under the moniker of Learning_SQL. What the weasel does it trolls the MSDN forums for questions, takes the question verbatim and posts it on StackOverflow, then takes the answers from StackOverflow and posts them back as his own answer on MSDN.

See http://stackoverflow.com/questions/3816683/index-to-speed-up-delete and http://social.msdn.microsoft.com/Forums/en/transactsql/thread/00667b94-1fb5-4238-b169-c596a2ae25ed.

Or http://stackoverflow.com/questions/3816023/sys-dm-fts-parser-permission and http://social.msdn.microsoft.com/Forums/en/transactsql/thread/a3ec4602-e0be-48f5-92ad-2eb25a6befc5.

http://stackoverflow.com/questions/3787986/need-an-idea-for-doing-bulk-compare-and-insert and http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/ed85fb0b-55db-474d-a45c-d715d659b0a8/#6157a812-82f8-473a-a689-d4217e1fbc13

http://stackoverflow.com/questions/3815502/stored-procedure-when-to-use-output-parameter-vs-return-variable/3815545#3815545 and http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/6013bdf2-1452-4495-865a-1bfcc8b361d0/#5f17fd95-16d0-40eb-b587-db35e44bdb69

This is not some accidental one copy of a answer, is a consitent behavior done with the sole purpose of acumulating MSDN reputation. Moral consideration aside, Learning_SQL is actually violating the StackOverflow attribution required content license:

  1. Visually indicate that the content is from Stack Overflow, Meta Stack Overflow, Server Fault, or Super User in some way. It doesn’t have to be obnoxious; a discreet text blurb is fine.
  2. Hyperlink directly to the original question on the source site (e.g., http://stackoverflow.com/questions/12345)
  3. Show the author names for every question and answer
  4. Hyperlink each author name directly back to their user profile page on the source site (e.g., http://stackoverflow.com/users/12345/username)

Incidents of plagiarization occur frequently on blogs, but with such a small comunity as the SQL Server one, where pretty much everybody knows everybody, the perpetrator doesn’t get too far. Brent Ozar (blog|Twitter) covered this subject on several articles:

But I guess imagination has no bounds, and now some new players (or same old ones?) are trying new grounds…

SqlDependency based caching of LINQ Queries

August 4th, 2010

Query Notifications is the SQL Server feature that allows a client to subscribe to notifications that are sent when data in the database changes irrelevant of how that change occurs. I have talked before about how Query Notifications works in the article The Mysterious Notification. This feature was designed specifically for client side cache invalidation: the applications runs a query, gets back the result and stores it in the cache. Whenever the result is changed because data was updated, the application will be notified and it can invalidate the cached result.

Leveraging Query Notifications from the managed clients is very easy due to the dedicated SqlDependency class that takes care of a lot of the details needed to be set up in place in order to be able to receive these notifications. But the MSDN examples and the general community know how with SqlDepenendency is geared toward straight forward usage, by attaching it to a SqlCommand object.

Leveraging SqlDependency from LINQ queries

There is no clear guidance from MSDN on how to mix these two technologies: Query Notifications and LINQ. There are a few in the community who have given hints on what has to be done, like this article Using SQLDependency objects with LINQ by Ryan Dunn (blog|twitter).

My goal is to propose an easy to use extension method that can add SqlDependency based caching to any IQueryable<T>. Usage should be as simple as:

var queryTags = from t in ctx.Tags select t;
var tags = queryTags.AsCached("Tags");
foreach (Tag t in tags)
{
  ...
}

The first invocation should run the query and return the result, setting up a SqlDependency notification and also caching the result. Subsequent invocations should return the cached result, without hitting the database. Any change to the Tags table in my example should trigger the SqlDependency and invalidate the cache. Next invocation would again run the query and return the updated result, setting up a new SqlDependency notification and caching the new result.

LinqToCache project

My solution is available as the LinqToCache project. To cache a LINQ query results and get active SqlDependency notifications when the data was changed, simply download the appropriate DLL for your target framework (.Net 3.5 or .Net 4.0) and add it as a reference to your project. Now any LINQ query (any IQueryable) will have a new extension method AsCached. This method returns an IEnumerable of the query result. First invocation will always hit the database and set up a SqlDependency, subsequent invocations will return the cached result as long as it was not invalidated.

Query Notifications restrictions

Not every query can be subscribed for notifications. The gory details of what works and what doesn’t are described in MSDN at Creating a Query for Notification:

  • The projected columns in the SELECT statement must be explicitly stated, and table names must be qualified with two-part names. Notice that this means that all tables referenced in the statement must be in the same database.
  • The statement may not use the asterisk (*) or table_name.* syntax to specify columns.
  • The statement may not use unnamed columns or duplicate column names.
  • The statement must reference a base table.
  • The statement must not reference tables with computed columns.
  • The projected columns in the SELECT statement may not contain aggregate expressions unless the statement uses a GROUP BY expression. When a GROUP BY expression is provided, the select list may contain the aggregate functions COUNT_BIG() or SUM(). However, SUM() may not be specified for a nullable column. The statement may not specify HAVING, CUBE, or ROLLUP.
  • A projected column in the SELECT statement that is used as a simple expression must not appear more than once.
  • The statement must not include PIVOT or UNPIVOT operators.
  • The statement must not include the UNION, INTERSECT, or EXCEPT operators.
  • The statement must not reference a view.
  • The statement must not contain any of the following: DISTINCT, COMPUTE or COMPUTE BY, or INTO.
  • The statement must not reference server global variables (@@variable_name).
  • The statement must not reference derived tables, temporary tables, or table variables.
  • The statement must not reference tables or views from other databases or servers.
  • The statement must not contain subqueries, outer joins, or self-joins.
  • The statement must not reference the large object types: text, ntext, and image.
  • The statement must not use the CONTAINS or FREETEXT full-text predicates.
  • The statement must not use rowset functions, including OPENROWSET and OPENQUERY.
  • The statement must not use any of the following aggregate functions: AVG, COUNT(*), MAX, MIN, STDEV, STDEVP, VAR, or VARP.
  • The statement must not use any nondeterministic functions, including ranking and windowing functions.
  • The statement must not contain user-defined aggregates.
  • The statement must not reference system tables or views, including catalog views and dynamic management views.
  • The statement must not include FOR BROWSE information.
  • The statement must not reference a queue.
  • The statement must not contain conditional statements that cannot change and cannot return results (for example, WHERE 1=0).
  • The statement can not specify READPAST locking hint.
  • The statement must not reference any Service Broker QUEUE.
  • The statement must not reference synonyms.
  • The statement must not have comparison or expression based on double/real data types.
  • The statement must not use the TOP expression.

Although this list of restrictions is pretty severe, there is still room left for plenty of useful queries than can be cached using SqlDependency notifications for invalidation.

Linq to SQL

Straight forward LINQ to SQL queries are valid for Query Notifications, as long as the first restriction listed above is cleared: table names must be qualified with two-part names. In practice, this means simply fully qualifying the table names in the context designer, or in the [Table] attribute on the class. that is, always use ‘dbo.Table’ instead of simply ‘Table’ (of course, replace ‘dbo’ with appropriate schema if necessary).

But there are a couple of conditions that are specially important for us: must not use the TOP expression and must not use … ranking and windowing functions.. These two restrictions mean the popular Skip() and Take() operators are not supported. Unfortunately, these are some of the most popular operators used with LINQ because they are the easiest way to implement paging of results.

LINQ to Entity Framework

My initial goal was to only support LINQ to SQL, given that the overwhelming majority of developers favor it over EF. But the implementation works with any IQueryable, so in theory it should just work with EF as well. Unfortunately, the way EF chooses to formulate the queries makes it incompatible with Query Notifications. Consider a simple Linq TO EF query like following:

var q = from p in ctx.Persons where p.FirstName == "Remus" select p;

This will generate the following SQL:

SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[LastName] AS [LastName]
FROM (SELECT
      [Persons].[PersonId] AS [PersonId],
      [Persons].[FirstName] AS [FirstName],
      [Persons].[LastName] AS [LastName]
      FROM [dbo].[Persons] AS [Persons]) AS [Extent1]
WHERE 'Remus' = [Extent1].[FirstName]

The gratuitous addition of a subquery violates the Query Notifications restrictions and the SqlDependency gets invalidated straight away with a Statement violation.

Download

The LinqToCache DLLs and source code are available at http://code.google.com/p/linqtocache/.

Remote Desktop Manager now available

June 29th, 2010

Microsoft internal folks have been used for years a little tool called the Remote Desktop Manager. This tool allows you to save connection settings for frequently used machines you remote into. You can group the servers, save the credentials used for each server or for each group. It allows you to tile and monitor multiple remote desks at once, and overall is a wonderful tool for anyone using remote desktops frequently.

Fortunately now the tool is available publicly for download from the Microsoft Download site.

High Volume Contiguos Real Time Audit and ETL

June 11th, 2010

Tomorrow at SQL Saturday #43 in Redmond I’ll be presenting a session documenting the real-time audit and ETL the Microsoft uses to manage the access policies on its network. This presentation goes over some of the challenges posed by a contiguos, non-stop, high trhoughput stream of audit records poses when loading the audit into a data warehouse. I’m posting here the slides I’m going to present, if you plan to attent you can preview them right now, and come to the session with an educated set of questions for your specific case.

Effective Speakers at Portland #devsat and #sqlsat27

May 11th, 2010

Which is faster: ++i or i++?

In 1998 I was looking to change jobs and I interviewed with a company for a C++ developer position. The interview went well and as we were approaching the end, one of the interviewers asked me this question: which is faster ++i or i++? I pondered the question a second, then the other interviewer said that is probably implementation specific. The first one corrected him that i++ must return the value before the increment therefore it must make a copy of itself, while ++i returns the value after the increment therefore does not need to make a copy of itself, it can return itself. With this my chance to actually answer the question and their chance to see how I approach the problem were gone, but the interview was finished anyway as we were out of time. I got the offer from them, yet I ended up with a different company. But that question lingered in my mind, I though what a clever little thing to know. Few months later I got my hands on the Effective C++ book by Scott Meyers, and this opened my appetite for the follow up book More Effective C++. And there it was, item 6 in More Effective C++: Distinguish between prefix and postfix forms of forms of increment and decrement operators.

These two books were tremendously important in forming me as a professional C++ developer. They got me starting in studying C++ more deeply, beyond what I had to use in my day to day job. I ended up taking a Brainbench C++ test and I scored in the top 10 worldwide, which pretty soon landed me an email from Microsoft recruiting. The rest, as they say, is history.

SQL Saturday #27 is going to be held on May 22 in Portland and will share the venue with Portland CodeCamp. The list of speakers is really impressive, and amongst them, you guessed, is Scott Meyers presenting CPU Caches and Why You Care. There are many more fine speakers and interesting topics for every taste, and the event is free. Is worth your time if you’re in the area, and well worth a trip to the City of Roses if you’re not.

I myself will be presenting a session on High Volume Real Time Contiguous ETL and Audit.

To register with the CodeCamp and SQL Saturday events go to http://devsat.eventbrite.com

What is Remus up to?

April 1st, 2010

Some of you already know this: I am again FTE with Microsoft. I was in a contract for a very interesting project with Microsoft IT over the past months in a vendor position, and getting back in touch with the cool stuff that goes on inside Microsoft kindled back the passion for bold projects with big impact. Since March 29 I’m working again as a developer with what is officially known as the SQL RDBMS Core Team. I’m no longer involved with the Service Broker though, I now work on the Access Methods. Indexes, BTrees, Heaps and lets not forget DBCC. Fun stuff.

Contract position in Seattle

March 9th, 2010

Do you know anybody that is interested in a 6mo-12mo contract position in Seattle (Redmond) area for a very interesting SQL Server project? Position requires good system architecture and development skills. Your main job will be writing T-SQL code for processing on a 24×7 mission critical system with a high transactions per second throughput. You will also be the escalation contact for operational aspects of monitoring and troubleshooting this project’s +50 SQL Server instances. Project involves Service Broker, Database Mirroring, Transactional Replication, Data Warehousing and ETL. You will need to either know these or be able to quickly come up to speed on these technologies.

If this sounds interesting to you, contact me. No agencies please.

bugcollect.com: better customer support

October 28th, 2009

I am a developer, I write applications for fun and profit, and I’ve been doing this basically my whole professional life. Over the years I’ve learned that it is important to understand the problems my users face. What are the most common issues, how often do they happen, who is most affected. I have tried the approach of logging into a text file and then asking my users to send me the log file. I’ve tried sending mail automatically from my application. It was useful, but my inbox just doesn’t scale to hundreds of messages that may happen after a … stormy release.

This is why I have created for myself an online service for application crash reporting. Applications can submit incident reports online and the service will collect them, aggregate them and do some initial analysis. I have been using this service in my applications over the past year and I think that if I find it so useful, perhaps you will too. So I’ve invested more resources into this, made it into a commercial product and put it out for everyone:http://bugcollect.com.

After an application is ready and published, bugcollect.com offers a private channel for collecting logging and crash reporting information. bugcollect.com analyzes crash reports and aggregates similar problems into buckets, groups incidents reported by the same source, helping the development team to focus on the most frequent crashes and problems. Developers get immediate feedback if a new release has a problem and they don’t have to ask for more information. Developers can also set up a response to an incident bucket, this response will be sent by bugcolect.com to any new incident report that falls into the same bucket. The application can then interpret this response and display feedback to the user, eg. it can instruct him about a new download available that fixes the problem.

bugcollect.com reporting differs from system crash reporting like iPhone crash, Mac ‘send to apple’ or Windows Dr. Watson because it is application initiated. An application can decide to submit a report anytime it wishes, typically in an exception catch block. All reports submitted to bugcollect.com are private and can be viewed only by the account owner, the application development team.

bugcollect.com features a public RESTful XML based API for submitting reports. There are already available client libraries for .Net and Java, as well as appender components for log4net and log4j. More client libraries are under development and an iPhone library will be made available soon.

Asynchronous T-SQL at SQL Saturday #26

September 28th, 2009

The Seattle area PASS group is organizing the SQL Saturday #26 in Redmond on October 2nd. There are many sessions to fill 3 tracks for a full day and all of them look quite interesting. The full schedule is available at http://www.sqlsaturday.com/schedule.aspx. The event is free and you get to hear presentations by such popular SQL persona as Kalen Delaney!

On the 10:15 slot yours truly will be talking about Asynchronous T-SQL processing. See you this Saturday at the new Commons MS campus in Redmond.

MySpace Uses SQL Server Service Broker to Protect Integrity of 1 Petabyte of Data

July 26th, 2009

I just found that Microsoft has published a use case about the way MySpace is using Service Broker on their service as the core message delivery system for the Service Dispatcher. We’re talking here 440 SQL Server instances and over 1000 databases. Quote from the use case:

Service Broker has enabled MySpace to perform foreign key management across its 440 database servers, activating and deactivating accounts for its millions of users, with one-touch asynchronous efficiency. MySpace also uses Service Broker administratively to distribute new stored procedures and other updates across all 440 database servers through the Service Dispatcher infrastructure.

That is pretty impressive. I knew about the MySpace SSB adoption since the days when I was with the Service Broker team. You probably all know my mantra I repeat all the time “don’t use fire and forget, is a bad message exchange pattern and there are scenarios when the database may be taken offline”? Guess how I found out those ‘scenarios’… Anyway, I’m really glad that they also made public some performance numbers. Until now I could only quote the 5000 message per second I can push in my own test test environment. Well, looks like MySpace has some beefier hardware:

Stelzmuller: “When we went to the lab we brought our own workloads to ensure the quality of the testing. We needed to see if Service Broker could handle loads of 4,000 messages per second. Our testing found it could handle more than 18,000 messages a second.”