When Is a Bug Considered “Fixed?””

AKA Relieving Symptoms vs. Solving Problems

So, I’ve had a bug bouncing around my “I should really figure out wtf is going on here” queue for awhile. There’s a “fix” checked in, but it doesn’t actually solve the underlying problem. For users of our software, it doesn’t matter… at the moment. But there could be other instances of this same issue lurking in our codebase, just waiting for someone to push buttons in exactly the right sequence to cause a crash.

So what makes this bug so damn… buggy? One word: threads. Threads drag determinism out back kicking and screaming, beat it to death with several blunt objects, then dump the body in shark-infested waters somewhere in the Caribbean. Basically, fuck my life. It gets better, though. We actually crash because we try to get item –1 out of a vector… that’s like trying to steal the –1st cookie from the cookie jar. The C++ runtime puts a stop to this nonsense by throwing an exception… but the default exception handler calls DebugBreak(), which destroys/stops the thread that called the problematic function in the first place. Fuck my life even harder.

That brings me to the next obstacle: the offending function is a callback. It’s not just any callback, though, oh no… it’s a SAPI callback. That means it involves multiple COM servers, only one of which is ours. Yep, tripple-fuck my life. This is quickly drifting over to the wrong side of that line between “a good challenge” and “totally futile.” I’ll probably be able to figure it out eventually, with some help… but until then it will be the bane of my existance.

In the meantime, what I’ve done is put a check at the top of our misbehaving function that checks if it’s being passed a bogus value (i.e. –1 or 2^32-1) and fails gracefully. This solves the immediate, user-visible problem (i.e. flaming crash dialog box of death), but still leaves the mystery of why one of our callbacks is misfiring on a destroyed object.