Sweep-and-prune document

Pierre
Posts: 67
Joined: Mon Jul 25, 2005 8:56 am

Sweep-and-prune document

Post by Pierre »

Hi,

Here is a small document about sweep-and-prune. Could you guys double-check it, and tell me if you find obvious mistakes/etc ? The second part is about Multi-SAP and it might provide some useful implementation details. I'll put the file on my website later, right now it has technical issues and I can't upload anything out there. I'll release the source code ASAP but it needs cleaning and I'm in the middle of moving, so, no time for this at the moment.

- Pierre
You do not have the required permissions to view the files attached to this post.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Sweep-and-prune document

Post by Erwin Coumans »

Hi Pierre,

Thanks for the doc, I only briefly skimmed through it, so expect more comments.

- Could you provide a reference to a forum topic, where we discussed multiple sap broadphases: http://www.bulletphysics.com/Bullet/php ... ?f=4&t=328

- Could you add the 5th method for pair management that Bullet currently uses: a sorted array of overlapping pairs?
Some game company added this to Bullet, originally from Playstation 3 /XBox 360. This eliminates the need for searching pairs, because the pairs get removed during the overlapping pair traversal, using an additional aabb check for each existing overlapping pair. It is nice for SPUs, reducing random access. This also gets rid of duplicates, so it works nice with a Multi-SAP where multiple broadphases report the same pair.

- This sort of the overlapping pair array can also be beneficial when organizing a parallel collision detector, with similar types of object pairs in the same batch (think of SPUs, where code/data sizes are limited).

- Havok also uses the 16-bit or 32-bit quantized integers for the SAP broadphase, not just Bullet. Many games succesfully shipped with this technique, so not everyone shares your 'However this is not a great approach' sentiment :-). Note that the a better fitting world bound is better, and so is a 32-bit accuracy. It also works nice with Multi-SAP, so each broadphase doesn't need to be very large.

- this quantized AABB versus quantized AABB can be made branch-free, which is much faster on XBox 360/PS3. In the optimized BVH (not SAP), some game company reports 3 times speedup.

Code: Select all

	//This block replaces the block below and uses no branches, and replaces the 8 bit return with a 32 bit return for improved performance (~3x on XBox 360)
	inline unsigned testQuantizedAabbAgainstQuantizedAabb(unsigned short int* aabbMin1,unsigned short int* aabbMax1,const unsigned short int* aabbMin2,const unsigned short int* aabbMax2) const
	{		
		return btSelect((unsigned)((aabbMin1[0] <= aabbMax2[0]) & (aabbMax1[0] >= aabbMin2[0])
			& (aabbMin1[2] <= aabbMax2[2]) & (aabbMax1[2] >= aabbMin2[2])
			& (aabbMin1[1] <= aabbMax2[1]) & (aabbMax1[1] >= aabbMin2[1])),
			1, 0);
	}
Thanks,
Erwin
Pierre
Posts: 67
Joined: Mon Jul 25, 2005 8:56 am

Re: Sweep-and-prune document

Post by Pierre »

Hi Erwin,

I'm not going to update it immediately: that's my last day at Ageia, my home PC is in a box, and I will not have any time for this before one or two weeks.
This eliminates the need for searching pairs, because the pairs get removed during the overlapping pair traversal, using an additional aabb check for each existing overlapping pair. It is nice for SPUs, reducing random access. This also gets rid of duplicates, so it works nice with a Multi-SAP where multiple broadphases report the same pair.
Yeah, I know this approach but I don't really like it. Having an extra overlap test for each *existing* pair seems to defeat the purpose of incremental results completely. If I have 20.000 sleeping pairs in my world, I don't want to perform 20.000 extra tests on all of them, that's madness. Never tried it though, it just seems... bad. My gut feeling is that performing a smaller (incremental) number of searches (which is O(1) anyway) is a lot better. But granted, I never profiled the alternative. (except when I profiled Bullet against the other SAPs, and found Bullet was slower. But maybe it's for different reasons, I didn't investigate much.)
Havok also uses the 16-bit or 32-bit quantized integers for the SAP broadphase, not just Bullet. Many games succesfully shipped with this technique, so not everyone shares your 'However this is not a great approach' sentiment . Note that the a better fitting world bound is better, and so is a 32-bit accuracy.
The fact that Havok uses it or that many games shipped used the technique does not make it the best approach in the world. This is not a solid argument, I'm sure you realize this.

As far as I can see it still forces the user to provide world-bounds at start of day, right? As a user myself, I find this super painful, not only because sometimes you don't know the bounds ahead of time, but also because - as you said - "better fitting world bound is better". In other words using poor world bounds gives worse performance, and it's one more option for users to screw up. And if they can screw up... they will.

Transforming the float also gives 32-bit integers, and you don't need the world-bounds, never. So it's quite obviously "better" in my book... I don't know, looks like a clear winner to me.

Code: Select all

this quantized AABB versus quantized AABB can be made branch-free, which is much faster on XBox 360/PS3. In the optimized BVH (not SAP), some game company reports 3 times speedup.
Yeah, the branch-free version might be a win on Xbox/PS3. For a SAP though, I doubt this makes much of a difference, as the overlap tests are not really the bottleneck. Probably worth a note anyway.


Thanks, I'll update the doc ASAP.

- Pierre
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Sweep-and-prune document

Post by Erwin Coumans »

Pierre wrote:My gut feeling is that performing a smaller (incremental) number of searches (which is O(1) anyway) is a lot better. But granted, I never profiled the alternative. (except when I profiled Bullet against the other SAPs, and found Bullet was slower. But maybe it's for different reasons, I didn't investigate much.)
The 'O(1)' search has random memory access, which is a bit of a hassle on modern architectures. If traversing pairs of sleeping objects become a bottleneck, keeping them in a separate array might be better. I haven't done benchmarks/comparisons with other broadphases, so I'm happy to change/improve things based on some detailed profiling.
The fact that Havok uses it or that many games shipped used the technique does not make it the best approach in the world. This is not a solid argument, I'm sure you realize this.

As far as I can see it still forces the user to provide world-bounds at start of day, right? As a user myself, I find this super painful, not only because sometimes you don't know the bounds ahead of time, but also because - as you said - "better fitting world bound is better". In other words using poor world bounds gives worse performance, and it's one more option for users to screw up. And if they can screw up... they will.
I argue that an integer broadphase is superior over floating point precision, because it spreads the precision evenly over the (bounded) space. I find that using single float precision gives the user the false impression that all is fine, whereas very far from the origin the AABB's become larger and larger. So the integer broadphase explicitly lets the user deal with this problem indeed. Bringing up Havok was just to show show that it is not just Bullet who succesfully uses this in practice. I agree it's preferable to not give users methods to screw it up, but if it is beneficial for performance/memory usage, users are often happy to bit the bullet. Combining the integerer broadphase with a multi-sap might relieve the pain a bit: if your initial estimated broadphase size doesn't fit, you just add another broadphase next to it.

Anyhow, I will review my pair management again, and do some more profiling. Did you have some public shared testbed to compare some of those broadphases?

Thanks for the discussion again, and hope you have a good 'last day' at Ageia, please say hi at everyone over there, in particular Adam, and have a good and safe journey.

Erwin
Erin Catto
Posts: 316
Joined: Fri Jul 01, 2005 5:29 am
Location: Irvine

Re: Sweep-and-prune document

Post by Erin Catto »

Pierre, this is a great document and a nice contribution.

I like the discussion of batch add/removal. This is something that has been missing from the literature. I suppose you would buffer all the asynchronous adds/removes and then batch them together at the beginning of collision detection.

You seem to gloss over some of the ordering of end-point updates. For example, on page 5 you have the min-point move up before the max-point. Usually you never want the same box's end-points to swap.

I agree with you that world bounds are a nuisance. Your floating point trick is quite nice. Do you have any suggestion for 16-bit values? float16s? It is a shame that Multi-SAP needs to consider world bounds again.

It would be nice if your box pruning section described how box pruning works. Does it require a radix sort to be fast?

I often need to do asynchronous AABB queries and ray casts. Do you suggest using another data structure for this?

Finally a brief critique of stabbing numbers and markers would be nice.
User avatar
John McCutchan
Posts: 133
Joined: Wed Jul 27, 2005 1:05 pm
Location: Berkeley, CA

Re: Sweep-and-prune document

Post by John McCutchan »

Is there any reason why you don't encode the max endpoint flag inside your integer endpoint position? Like so,

Code: Select all

uint32_t ep= (uint32_t&)p;
uint32_t mask = -int32_t(ep >> 31) | 0x80000000;
ep = ((ep ^ mask) & 0xfffffffe) | (uint32_t)max;
This does (sometimes) move the endpoints but is safe because min points are always <= their true value and max points are always >= their true value. In both cases the change in endpoint positions are only -1 or +1 respectively.
Pierre
Posts: 67
Joined: Mon Jul 25, 2005 8:56 am

Re: Sweep-and-prune document

Post by Pierre »

Is there any reason why you don't encode the max endpoint flag inside your integer endpoint position?
Well, is there any reason to do so? It's already encoded in the box index, equally efficiently.