The bias term is a function of position and time. It is my understanding that the constraint error will decay over time if and only if your baumgarte term is as such: 0 < baumgarte < 2 / dt. The function oscillates, and as such will be over damped if baumgarte is less than 1 / dt.
You don't want your solution to converge with an oscillation, you want it to converge smoothly going straight towards the solution. As to why the bias isn't critically damped I imagine is because of the nature of an iterative solver: each constraint is solved in isolation and can affect the results of previously solved constraints.
As such I suppose it would be "safer" to over-damp your function in order to avoid the chance of oscillation. This is why you scale by the frequency of timestepping, and then go a little lower with the baumgarte term. This makes sense and explains, at least to me, why baumgarte converges with a horrid looking oscillation once error gets too large.
I say all of this with the disclaimer that I'm not totally sure if this is correct info, just my current understanding.