System pagefile size on machines with large RAM

November 22nd, 2009

Irrelevant of the size of the RAM, you still need a pagefile at least 1.5 times the amount of physical RAM. This is true even if you have a 1 TB RAM machine, you’ll need 1.5 TB pagefile on disk (sounds crazy, but is true)

When a process asks for MEM_COMMIT memory via VirtualAlloc/VirtualAllocEx, the requested size needs to be reserved in the pagefile. This was true in the first Win NT system, and is still true today see Managing Virtual Memory in Win32:

When memory is committed, physical pages of memory are allocated and space is reserved in a pagefile.

Bare some extreme odd cases, SQL Server will always ask for MEM_COMMIT pages. And given the fact that SQL uses a Dynamic Memory Management policy that reserves upfront as much buffer pool as possible (reserves and commits in terms of VAS), SQL Server will request at start up a huge reservation of space in the pagefile. If the pagefile is not properly sized errors 801/802 will start showing up in SQL’s ERRORLOG file and operations.

This always causes some confusion, as administrators erroneously assume that a large RAM eliminates the need for a pagefile. In truth the contrary happens, a large RAM increases the need for pagefile, just because of the inner workings of the Windows NT memory manager. Although reserved pagefile is, hopefully, never used, the problem of reserving such a huge pagefile file can be quite serious and needs to be accounted for during capacity planning.

bugcollect.com: better customer support

October 28th, 2009

I am a developer, I write applications for fun and profit, and I’ve been doing this basically my whole professional life. Over the years I’ve learned that it is important to understand the problems my users face. What are the most common issues, how often do they happen, who is most affected. I have tried the approach of logging into a text file and then asking my users to send me the log file. I’ve tried sending mail automatically from my application. It was useful, but my inbox just doesn’t scale to hundreds of messages that may happen after a … stormy release.

This is why I have created for myself an online service for application crash reporting. Applications can submit incident reports online and the service will collect them, aggregate them and do some initial analysis. I have been using this service in my applications over the past year and I think that if I find it so useful, perhaps you will too. So I’ve invested more resources into this, made it into a commercial product and put it out for everyone:http://bugcollect.com.

After an application is ready and published, bugcollect.com offers a private channel for collecting logging and crash reporting information. bugcollect.com analyzes crash reports and aggregates similar problems into buckets, groups incidents reported by the same source, helping the development team to focus on the most frequent crashes and problems. Developers get immediate feedback if a new release has a problem and they don’t have to ask for more information. Developers can also set up a response to an incident bucket, this response will be sent by bugcolect.com to any new incident report that falls into the same bucket. The application can then interpret this response and display feedback to the user, eg. it can instruct him about a new download available that fixes the problem.

bugcollect.com reporting differs from system crash reporting like iPhone crash, Mac ‘send to apple’ or Windows Dr. Watson because it is application initiated. An application can decide to submit a report anytime it wishes, typically in an exception catch block. All reports submitted to bugcollect.com are private and can be viewed only by the account owner, the application development team.

bugcollect.com features a public RESTful XML based API for submitting reports. There are already available client libraries for .Net and Java, as well as appender components for log4net and log4j. More client libraries are under development and an iPhone library will be made available soon.

select count(*);

October 26th, 2009

Quick trivia: what is the result of running SELECT COUNT(*);?

That’s right, no FROM clause, just COUNT(*). The answer may be a little bit surprising, is 1. When you query SELECT 1; the result is, as expected, 1. And SELECT 2; will return 2. So SELECT COUNT(2); returns, as expected, 1, after all it counts how many rows are in the result set. But SELECT COUNT(*); has a certain smell of voo-doo to it. Ok, is the * project operator, but project from… what exactly? It feels eerie, like a count is materialized out of the blue.

How about SELECT COUNT(*) [MyTable]. Well, that’s actually just a shortcut for SELECT COUNT(*) AS [MyTable], so it still returns 1 but in a column named MyTable. Now you understand why my heart missed a bit when I checked how I initialized a replication subscription and I forgot to type in FROM

Asynchronous T-SQL at SQL Saturday #26

September 28th, 2009

The Seattle area PASS group is organizing the SQL Saturday #26 in Redmond on October 2nd. There are many sessions to fill 3 tracks for a full day and all of them look quite interesting. The full schedule is available at http://www.sqlsaturday.com/schedule.aspx. The event is free and you get to hear presentations by such popular SQL persona as Kalen Delaney!

On the 10:15 slot yours truly will be talking about Asynchronous T-SQL processing. See you this Saturday at the new Commons MS campus in Redmond.

On SQL Server boolean operator short-circuit

September 13th, 2009

Recently I had several discussions all circling around the short-circuit of boolean expressions in Transact-SQL queries. Many developers that come from an imperative language background like C are relying on boolean short-circuit to occur when SQL queries are executed. Often they take this expectation to extreme and the correctness of the result is actually relying on the short-circuit to occur:

select 'Will not divide by zero!' where 1=or 1/0=0

In the SQL snippet above the expression on the right side of the OR operator would cause a division by zero if ever evaluated. Yet the query executes fine and the successful result is seen as proof that operator short-circuit does happen! Well, is that all there is? Of course not. An universal quantification cannot be demonstrated with an example. But it can be proven false with one single counter example!

Luckily I have two aces on my sleeve: for one I know how the Query Optimizer works. Second, I’ve stayed close enough to Microsoft CSS front lines for 6 months to see actual cases pouring in from developers bitten by the short-circuit assumption. Here is my counter-example case:


create table eav (     
	eav_id int identity(1,1) primary key,     
	attribute varchar(50) not null,     
	is_numeric bit not null,     
	[value] sql_variant null); 
	
create index eav_attribute on eav(attribute) include ([value]); 
go 

-- Fill the table with random values 
set nocount on 
declare @i int; 
select @i = 0; 
while @i < 100000 
begin
    declare @attribute varchar(50),
	          @is_numeric bit,
			  @value sql_variant;     
	select @attribute = 'A' + cast(cast(rand()*1000 as  int) as varchar(3));     
	select @is_numeric = case when rand() > 0.5 then 1 else 0 end;     
	if 1=@is_numeric         
		select @value = cast(rand() * 100 as int);     
	else         
		select @value = 'Lorem ipsum';     
	insert into eav (attribute, is_numeric, [value])     
		values (@attribute, @is_numeric, @value);     
	select @i = @i+1; 
end 
go 

-- insert a 'trap' 
insert into eav (attribute, is_numeric, [value]) values ('B1', 0, 'Gotch ya'); 
go
 
-- select the 'trap' value 
select [value] from eav 
	where      attribute = 'B1'      
	and is_numeric = 1      
	and cast([value] as int) > 50 
go 

Msg 245, Level 16, State 1, Line 3
Conversion failed when converting the varchar value 'Gotch ya' to data type int.

This happens on SQL Server 2005 SP2. Clearly, the conversion does occur even though the value is marked as ‘not numeric’. Whats going on here? To better understand, lets insert a known value that can be converted and then run the same query again and look at the execution plan:


insert into eav (attribute, is_numeric, [value]) values ('B2', 1, 65); 
go 

select [value] from eav 
	where      attribute = 'B2'      
	and is_numeric = 1      
	and cast([value] as int) > 50;
go  

boolean short-circuit counter example query plan

boolean short-circuit counter example query plan

Looking at the plan we can see how the query is actually evaluated: seek on the non-clustered index for the attribute ‘B2’, project the ‘value’, filter for the value predicate ‘cast([value] as int)>50’ then perform a nested join to look up the ‘is_boolean’ in the clustered index! So the right side of the AND operator is evaluated first. Q.E.D.

Is this a bug? Of course not. SQL is a declarative language, the query optimizer is free to choose any execution path that provide the requested result. Boolean operator short-circuit is NOT GUARANTEED. My query has set up a trap for the query optimizer, by providing a tempting execution path using the non-clustered index. For my example to work I had to set up a large table and enough distinct values of ‘attribute’ so that the optimizer would see the non-clustered index access followed by bookmark look up as a better plan than a clustered scan. And it is, by all means a better plan. But then I placed my trap: by adding the ‘value’ as an included column in the non-clustered index, I give the optimizer a too sweet to resists opportunity to evaluate the filter predicate on the ‘value’ column before it evaluates the filter predicate on the ‘is_numeric’ column, thus forcing the break on the short-circuit assumption.

Passing Parameters to a Background Procedure

August 18th, 2009

Code on GitHub: rusanu/async_tsql

I have posted previously an example how to invoke a procedure asynchronously using service Broker activation. Several readers have inquired how to extend this mechanism to add parameters to the background launched procedure.

Passing parameters to a single well know procedure is easy: the parameters are be added to the message body and the activated procedure looks them up in the received XML, passing them to the called procedure. But is significantly more complex to create a generic mechanism that can pass parameters to any procedure. The problem is the type system, because the parameters have unknown types and the activated procedure has to pass proper typed parameters to the invoked procedure.

A generic solution should accept a variety of parameter types and should deal with the peculiarities of Transact-SQL parameters passing, namely the named parameters capabilities. Also the invocation wrapper usp_AsyncExecInvoke should directly accept the parameters for the desired background procedure. After considering several alternatives, I settled on the following approach:

Read the rest of this entry »

Asynchronous procedure execution

August 5th, 2009

Code on GitHub: rusanu/async_tsql

Update: a version of this sample that accepts parameters is available in the post Passing Parameters to a Background Procedure

Recently an user on StackOverflow raised the question Execute a stored procedure from a windows form asynchronously and then disconnect?. This is a known problem, how to invoke a long running procedure on SQL Server without constraining the client to wait for the procedure execution to terminate. Most times I’ve seen this question raised in the context of web applications when waiting for a result means delaying the response to the client browser. On Web apps the time constraint is even more drastic, the developer often desires to launch the procedure and immediately return the page even when the execution lasts only few seconds. The application will retrieve the execution result later, usually via an Ajax call driven by the returned page script.

Read the rest of this entry »

MySpace Uses SQL Server Service Broker to Protect Integrity of 1 Petabyte of Data

July 26th, 2009

I just found that Microsoft has published a use case about the way MySpace is using Service Broker on their service as the core message delivery system for the Service Dispatcher. We’re talking here 440 SQL Server instances and over 1000 databases. Quote from the use case:

Service Broker has enabled MySpace to perform foreign key management across its 440 database servers, activating and deactivating accounts for its millions of users, with one-touch asynchronous efficiency. MySpace also uses Service Broker administratively to distribute new stored procedures and other updates across all 440 database servers through the Service Dispatcher infrastructure.

That is pretty impressive. I knew about the MySpace SSB adoption since the days when I was with the Service Broker team. You probably all know my mantra I repeat all the time “don’t use fire and forget, is a bad message exchange pattern and there are scenarios when the database may be taken offline”? Guess how I found out those ‘scenarios’… Anyway, I’m really glad that they also made public some performance numbers. Until now I could only quote the 5000 message per second I can push in my own test test environment. Well, looks like MySpace has some beefier hardware:

Stelzmuller: “When we went to the lab we brought our own workloads to ensure the quality of the testing. We needed to see if Service Broker could handle loads of 4,000 messages per second. Our testing found it could handle more than 18,000 messages a second.”

Fix slow application startup due to code sign validation

July 24th, 2009

Sometimes you are faced with applications that seem to take ages to start up. Usually they freeze for about 30-40 seconds and then all of the sudden they come to live. This happens for both native and managed application and it sometimes manifest as an IIS/ASP/ASP.Net AppPool starting up slow on the first request. The very first thing I always suspect is code signing verification. When a signed module is checked the certificate verification engine may consider that the Certificate Revocation List (CRL) it posses is obsolete and attempt to download a new one. For this it connects to the internet. The problem occurs when the connectivity is either slow, or blocked for some reason. By default the verification engine will time out after 15 seconds and resume with the old, obsolete, CRL it has. The timeout can occur several times, adding up to start up times of even minutes. This occurs completely outside of the control of the application being started, its modules are not even properly wired up in memory so there is no question of application code yet running.

The information on this subject is scarce to say the least. Luckily there is an TechNet article that describes not only the process occuring, but also the controlling parameters: Certificate Revocation and Status Checking. To fix the problem on computers with poor internet conectivity, registry settings have to be modified in the HKLM\SOFTWARE\Microsoft\Cryptography\OID\EncodingType 0\CertDllCreateCertificateChainEngine\Config key:

ChainUrlRetrievalTimeoutMilliseconds
This is each individual CRL check call timeout. If is 0 or not present the default value of 15 seconds is used. Change this timeout to a reasonable value like 200 milliseconds.
ChainRevAccumulativeUrlRetrievalTimeoutMilliseconds
This is the aggregate CRL retrieval timeout. If set to 0 or not present the default value of 20 seconds is used. Change this timeout to a value like 500 milliseconds.

With these two changes the code signing verification engine will timeout the CRL refresh operation in 500 milliseconds. If the connectivity to the certificate authority site is bad, this will dramatically increase the application start up times for code signed applications.

Inspiration is perishable

July 8th, 2009

I am following the 37signals blog ever since I kinda randomly stumbled upon their Getting Real book. If you never heard about them, definitely check out the book, is a very common sense approach to managing product development in the age of Internets. On this post http://www.37signals.com/svn/posts/1798-jasons-talk-at-big-omaha-2009 I was really touched by one remark: Inspiration is perishable. The ideas you have can linger in your head for a long time, but the inspiration for it fades quickly. So don’t postpone it, by the time you get to it you’ll only deliver a pale image of the original idea. Do it when you’re pumped up and thrilled by it.

I reckon I’m a procrastinator deLuxe, but I have to agree. I know the difference between working at 2 am. and not feeling a bit tired when I’m excited about my work on one hand, and the damp feeling of exhaustion that drags you to watch some stupid TV show at 6 pm because I’m bored with the current project on the other hand.