How to Deal with Negative SPIDs in SQL Server

What to do next when you find SPID -2 in SQL Server

Error : Process ID -2 is not a valid process ID

This article applies to:
SQL Server 2000
SQL Server 2005
SQL Server 2008

Youíve been performing an investigation in SQL Server into a performance issue. Maybe some database blocking issue and youíve done the usual things and youíve just run an sp_who or sp_who2. Or perhaps youíve been browsing the error log for potential problems and spotted an odd error message. But fundamentally, what you see is a SPID with a negative number. Specifically, SPID=-2. And itís at the top of a blocking chain with a dozen other transactions behind it trying to acquire resource. Until itís killed, nothing will happen.

Using KILL -2 Should Do It

So you do what you always do in these situations. Wait for it to complete, or more usually, because thereís an issue, you make a call on it and kill the SPID:

KILL -2

And what comes back is:

Msg 6101, Level 16, State 1, Line 1
Process ID -2 is not a valid process ID. Choose a number between 1 and 2048

Oh flip. Or something similar. How do I deal with this? Iíve never seen a negative SPID before, and now, when I have a problem and Iím trying to KILL it, it wonít let me. Iím going to have to re-start SQL Server!!

Give Me the Fix, Now

Iíll give you the fix first, youíre probably desperate for it (and it doesnít involve restarting SQL Server). Then Iíll explain why:

Run the following (youíll need to have a SQL Server login with sysadmin or processadmin privileges to do this):

select req_transactionUOW
from master..syslockinfo
where req_spid = -2

This will return a 32 digit UOW number which looks like a GUID. Something like ĎDEF12078-0199-1212-B810-CD46A73F2498í

Copy this into your query window and run:

KILL ĎDEF12078-0199-1212-B810-CD46A73F2498í

Run sp_who/sp_who2 again and you will probably see that the offending SPID has disappeared. If it hasnít, itís probably still in rollback and youíll have to wait, but by nature this kind of SPID is usually pretty quick to roll back and terminate. Yep, itís THAT simple.

The Explanation

Phew, that was fun, good job you found this page! Right, now if youíre interested, letís have a look at whatís happened and why. The good news is, none of itís rocket science, so read on.

What weíve just killed is a distributed transaction SPID. An orphaned distributed transaction SPID to be totally precise. Something that involves MSDTS, the Microsoft Distributed Transaction Coordinator. (Donít run, youíre past the worst bit).

Whatís happened is that when a transaction involves data which resides on more than one server, such as when a database record is replicated out to two or more servers, MSDTC needs to become involved. SQL Server handles this transparently.

However, occasionally, all does not go as well as it should. A server drops off the network or thereís a power outage at the distributor server. Something messy which computers arenít very good at dealing with. MSDTC usually handles these scenarios very well, ensuring that the rules involving the database ACID properties are adhered to so that everything stays in sync which should, and everyoneís happy about what data is in their tables.

But when MSDTC canít recover from one of these scenarios, the SPID which is handling the distributed transaction on one (or more) servers canít do any more work. The result is an orphaned SPID.

In order to mark this as an orphaned, distributed transaction SPID, SQL Server changes the SPID from a positive number to -2. The only problem is, the SPID may still be holding on to resources (usually table, page or row locks), and blocking other SPIDs which want to access that database object. But because the KILL command canít handle SPIDs with a value of less than 1, you canít use it to kill a negative SPID, such as this. Hence the need to look up the UOW (Unit of Work) ID of the offending SPID before the process can be terminated.

How Can I Stop it Happening Again?

You probably canít. UPS. Redundant hardware. Dual network links. And lots of cash. Even then, itíll happen again sometime, itís just cost you more in trying to prevent it. By their very nature, distributed transactions occasionally fail. Thatís the whole reason why MSDTC exists Ė if we could guarantee 100% committed distributed transactions every time, there would be no need to have a distributed transaction coordinator. But we canít, and the next best thing we can do is to have something that fails gracefully and flags us with the problem so that we can act on it. In this case, flagging a SPID with a -2 so that we know itís happened.

End of Technical Article

SQL Server Club is a free community service from Norb Technologies - Making SQL Server Faster - www.norbtechnologies.com