Parallel LCP solving methods?

Please don't post Bullet support questions here, use the above forums instead.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Parallel LCP solving methods?

Post by Erwin Coumans »

In rigid body simulation contact solving, each step in the iterative scheme like Gauss Seidel requires the result from the previous step. This makes it difficult to parallelize. Jacobi method on the other hand can be easier implemented in parallel. A mix between Jacobi and Gauss Seidel, a kind of block solving process might enable faster solving for multiprocessor architectures.

Does anyone have ideas, experience or references to papers about parallel LCP solver methods that fit a small memory footprint?
Erin Catto
Posts: 316
Joined: Fri Jul 01, 2005 5:29 am
Location: Irvine

Post by Erin Catto »

One way to think about this is to visualize how GS operates on the body-constraint graph. Assume constraints only operate on one or two bodies. GS processes each constraint in turn, adjusting the net force that acts on the two bodies.

Two constraints that affect different pairs of bodies can be solved independently. Imagine a vertical stack of boxes. Two independent GS loops can operate, one from the top down, the other from the bottom up. When they get half way, the results must be merged.

Perhaps such an approach can be generalized to make an efficient parallel PGS solver.
russ
Posts: 2
Joined: Thu Jul 07, 2005 12:27 am
Location: Mountain View, California

Post by russ »

right: this kind of partitioninig (i.e. to minimize the interactions between different parts of a graph) is common in sparse matrix code in high performance computing - both to minimize the amount of fill-in during factorization, and to minimize the amount of interprocessor communication during parallelization. there are many algorithms to generate effective orderings (lots of research results here). some matrix structures (such as tree or mesh structures) have optimal orderings that can be computed easily, but matrices with arbitrary structures (e.g. resulting from collision detection) require fairly complex algorithms. for a realtime GS solver you could easily spend more time computing a good non-interaction ordering than you'd save by parallelization.

i think some testing is in order: try block gauss-siedel with various block sizes and compare the resulting accuracy (i.e. convergence rate). if it's "good enough" then i'd forego the tricky reordering algorithms. you might find, for example, that you could do one extra GS iteration to make up for some reasonable block size.

what other approximations to GS lend themselves to parallelization? i parallelized ODE a while ago (on an SGI altix machine) by unrolling the inner loop of GS and simply not worrying about data dependence or competing updates to the same output elements. the result: accuracy indistinguishable from the serialized version.

russ.
Antonio Martini
Posts: 126
Joined: Wed Jul 27, 2005 10:28 am
Location: SCEE London

Re: Parallel LCP solving methods?

Post by Antonio Martini »

has anybody already looked into this?

"Matrix Multisplitting Methods with Applications to the Linear Complementarity Problems"

http://lsec.cc.ac.cn/~bzz/Public/series.html
mewert
Posts: 52
Joined: Sat Oct 08, 2005 1:16 am
Location: Itinerant

Post by mewert »

For implementation on a Cell processor, which I'm guessing you're most interested in Erwin ;) I'd probably use a buffering scheme. Where you have a buffer on each processor for the delta of the solution values of the constraints shared between two blocks.

You would order your constraints, to solve the non-shared elements first. After updating from the buffer and solving the shared elements, you would kick the results off to the other processors. Those results would get collected in the buffer. By the time you've finished your next iteration on the non-shared constraints, the results from the other processors should be ready to use. So the shared constraints would kind of lag one iteration behind ( is it Jacobi iteration here? ).

I'm assuming you're using an LCP solver that is linear in performance per iteration, not a "Big Matrix" solver. One problem I see is that when complementarity fails on a constraint and you have to "undo" the forces applied, this may need some special handling, or it may just work out.

I'm making some assumptions about the Cell's DMA capabilities here, that it can DMA into a buffer while the SPU is chugging away on the current iteration. For a more standard multi-core processor, you would use "restrict" on the non-shared elements and have a separate loop to handle the shared elements which will require reading and writing to memory. Hope the rambling style of my writing isn't too confusing.

- Michael Alexander Ewert