|
What to do next when you find SPID -2 in SQL Server
Error: Process ID -2 is not a valid process ID
This article applies to:
SQL Server 2000
SQL Server 2005
SQL Server 2008
You’ve been performing an investigation in SQL Server into a performance issue. Maybe some database blocking issue and you’ve done the usual things and
you’ve just run an sp_who or sp_who2. Or perhaps you’ve been browsing the error log for potential problems and spotted an odd error message. But
fundamentally, what you see is a SPID with a negative number. Specifically, SPID=-2. And it’s at the top of a blocking chain with a dozen other
transactions behind it trying to acquire resource. Until it’s killed, nothing will happen.
Using KILL -2 Should Do It
So you do what you always do in these situations. Wait for it to complete, or more usually, because there’s an issue, you make a call on it and kill the SPID:
KILL -2
And what comes back is:
Msg 6101, Level 16, State 1, Line 1
Process ID -2 is not a valid process ID. Choose a number between 1 and 2048
Oh flip. Or something similar. How do I deal with this? I’ve never seen a negative SPID before, and now, when I have a problem and I’m trying to KILL it, it won’t let me.
I’m going to have to re-start SQL Server!!
Give Me the Fix, Now
I’ll give you the fix first, you’re probably desperate for it (and it doesn’t involve restarting SQL Server). Then I’ll explain why:
Run the following (you’ll need to have a SQL Server login with sysadmin or processadmin privileges to do this):
select req_transactionUOW
from master..syslockinfo
where req_spid = -2
This will return a 32 digit UOW number which looks like a GUID. Something like ‘DEF12078-0199-1212-B810-CD46A73F2498’
Copy this into your query window and run:
KILL ‘DEF12078-0199-1212-B810-CD46A73F2498’
Run sp_who/sp_who2 again and you will probably see that the offending SPID has disappeared. If it hasn’t, it’s probably still in rollback and you’ll have to wait,
but by nature this kind of SPID is usually pretty quick to roll back and terminate. Yep, it’s THAT simple.
The Explanation
Phew, that was fun, good job you found this page! Right, now if you’re interested, let’s have a look at what’s happened and why. The good news is, none of it’s
rocket science, so read on.
What we’ve just killed is a distributed transaction SPID. An orphaned distributed transaction SPID to be totally precise. Something that involves MSDTS, the Microsoft
Distributed Transaction Coordinator. (Don’t run, you’re past the worst bit).
What’s happened is that when a transaction involves data which resides on more than one server, such as when a database record is replicated out to two or more servers,
MSDTC needs to become involved. SQL Server handles this transparently.
However, occasionally, all does not go as well as it should. A server drops off the network or there’s a power outage at the distributor server. Something messy which
computers aren’t very good at dealing with. MSDTC usually handles these scenarios very well, ensuring that the rules involving the database ACID properties are adhered
to so that everything stays in sync which should, and everyone’s happy about what data is in their tables.
But when MSDTC can’t recover from one of these scenarios, the SPID which is handling the distributed transaction on one (or more) servers can’t do any more work.
The result is an orphaned SPID.
In order to mark this as an orphaned, distributed transaction SPID, SQL Server changes the SPID from a positive number to -2. The only problem is, the SPID may still be
holding on to resources (usually table, page or row locks), and blocking other SPIDs which want to access that database object. But because the KILL command can’t handle
SPIDs with a value of less than 1, you can’t use it to kill a negative SPID, such as this. Hence the need to look up the UOW (Unit of Work) ID of the offending SPID before
the process can be terminated.
How Can I Stop it Happening Again?
You probably can’t. UPS. Redundant hardware. Dual network links. And lots of cash. Even then, it’ll happen again sometime, it’s just cost you more in trying to prevent it.
By their very nature, distributed transactions occasionally fail. That’s the whole reason why MSDTC exists – if we could guarantee 100% committed distributed transactions
every time, there would be no need to have a distributed transaction coordinator. But we can’t, and the next best thing we can do is to have something that fails gracefully
and flags us with the problem so that we can act on it. In this case, flagging a SPID with a -2 so that we know it’s happened.
|