Congrats on the blender implementation, I'm glad it all came together.
So far, I have just implemented a modified conjugate gradient solver. The preconditioned cg solver is explained really well in: SHEWCHUK, J. 1994. "An introduction to the conjugate gradient method without the agonizing pain"
Wrapping your head around the modified algorithm with preconditioning is a bit trickier. Baraff/Witkin's paper does provide the pseudocode.
------------
This isn't a big issue for me yet, but since people asked about it...
On the subject of cloth responsiveness...
A straightforward cloth implementation is going to require numerous iterations for a tug on one end to propogate across a spring network with a lot of points. Rather than spending cycles updating parts of the cloth that dont need updating, it only makes sense to try and focus the computation where it is needed.
On of the mentioned power point presentations describes a verlet/relaxation implementation that would order the points and bias the iteration based on where the cloth was attached to a moving body. It may be possible to do something similar for implicit integration. Recall that the conjugate gradient starts with an "initial guess" and then iterates from there to the result. Usually the zero vector is used as a starting point. This could be a good place to insert the extra information that we know about the simulation. i.e. make a better "initial guess" for the delta velocity vector (dV). Then after that you just use the same old conjugate gradient method. This strategy should give you the best of both worlds. You take advantage of "additional knowledge" about cloth construction and attached/manipulated vertices, meanwhile your iteration solver still uses the same algorithm (regardless of the cloth configuration) that can be run in PARALLEL - especially important for any non-single-cpu implementations.
So does this work? I did some (limited) experiments initializing based on how the cloth was tugged by the mouse. I did find a downside. Normally, since I only care about aesthetic behavior, I use a rather high epsilon value for the CG solver to keep the iteration count really low. When it always uses zero as the starting vector, this doesn't matter - but with the starting point being dramatically different from frame to frame, there were small but noticable discrepancies in the behavior at points on the cloth.
Note that my initialization of dV was poor - everything was initialized to the "tug". Ideally you would only want to apply that to things at full stretch starting from the attached point. This would be similar to doing a single one of those verlet/relaxation (loop-dependant) iteration type of steps to initialize dV. In essence, for my tests, I was choosing what would have been a worse "starting point" than zero for certain parts of the mesh. In conclusion, I cant really make any conclusions at this point.
-------------------
In response to the request for a demo with cloth on an animated character... Yes, I agree that is something that I need to do. Unless all that "pretty math" can lead to usable applications with good performance, then it doesn't do much good. Trouble is finding time to work on this
Thanks for the comments all