Event ID 833: I/O requests taking longer than 15 seconds
October 28th, 2008The error 833 is usually associated with hardware or system driver problems and the typical recommendation is to replace the hardware or update the drivers and firmware used. However there is a common scenario that leads to this problem when your hardware is fine and sound.
The MSDN documents the reason for error 833 as this: “This problem can be caused system performance issues, hardware errors, firmware errors, device driver problems, or filter driver intervention in the IO process.“. When you encounter this error in your ERRORLOG it means is time to open the hardware vendors brochures and start looking for a replacement on your I/O subsystem. Before you look into making a salesman happy, there is one more thing you have to verify: is your CPU using some sort of clock frequency adjustment technologies, like CPU stepping or Cool’n’Quiet? These technologies affect the way SQL Server measures time passed and may result in erroneous measurements. The problem is described in KB 931279: http://support.microsoft.com/kb/931279. I have seen the effects of this behavior on some systems and I can say that the results of SQL Server I/O time reports are way, way off chart. When you look in sys.dm_io_virtual_file_stats and see average I/O write stall times of 245000 milliseconds on a system that works fine, you know something must be wrong. I guess root cause is clock drift between the CPUs that causes a request submitted on one scheduler to be completed on a different one and the time drift between the schedulers to be added to the I/O duration. A clear indication of this drift causing erroneous I/O results is this message in the ERRORLOG: The time stamp counter of CPU on scheduler id ... is not synchronized with other CPUs
The workaround is fairly simple, force the CPU to run a maximum frequency always. First, make sure you are using an ‘always on’ power scheme:
1. Click Start, click Run, type Powercfg.cpl, and then click OK.
2. In the Power Options Properties dialog box, click Always On in the Power schemes list.
3. Click OK.
In addition, make sure no third party tools are enabling the CPU clock changes even when the power scheme is set to ‘always on’.
Edit Dec. 17 2008
There are a number of CSS blog articles that also mention this problem:
- http://blogs.msdn.com/psssql/archive/2008/12/16/how-it-works-sql-server-no-longer-uses-rdtsc-for-timings-in-sql-2008-and-sql-2005-service-pack-3-sp3.aspx
- http://blogs.msdn.com/psssql/archive/2006/11/27/sql-server-2005-sp2-will-introduce-new-messages-to-the-error-log-related-to-timing-activities.aspx
- http://blogs.msdn.com/psssql/archive/2007/08/19/sql-server-2005-rdtsc-truths-and-myths-discussed.aspx
[…] The problem around incorrectly measured IO times due to CPU time drift was fixed by the SQL Server 2005 SP3 release. I have talked about this issue and explained the problem in my post: http://rusanu.com/2008/10/28/event-id-833-io-requests-taking-longer-than-15-seconds. […]