btCollisionWorld::rayTest is slower than I'd expect

Post Reply
venzon
Posts: 12
Joined: Mon Nov 26, 2007 2:42 am

btCollisionWorld::rayTest is slower than I'd expect

Post by venzon »

First off, let me say that bullet seems to be extremely well-written. I like the way the API works, and I haven't encountered any bugs or inconsistent behavior so far. I prefer it over ODE. The bullet documentation could use some work, but the large assortment of demos helps.

I'm using bullet 2.64 in C++ for collision detection only. So, I use bullet similarly to the CollisionInterfaceDemo. I have a large number (~1500) of static trimesh collision objects (btBvhTriangleMeshShape), and every frame I need to collide those objects with a box and with some (~5) rays. For the box collision I use btCollisionWorld::performDiscreteCollisionDetection() and then look at the manifolds. This seems to perform pretty well. For the ray collision, I use btCollisionWorld::rayTest. Unfortunately, rayTest is veeerrry slow. Here's what a flat profile of my application looks like:

Code: Select all

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 42.59      9.14     9.14                             btCollisionWorld::rayTest(btVector3 const&, btVector3 const&, btCollisionWorld::RayResultCallback&, short)
 27.68     15.08     5.94                             btTriangleMeshShape::getAabb(btTransform const&, btVector3&, btVector3&) const
  2.94     15.71     0.63                             btCollisionWorld::performDiscreteCollisionDetection()
  2.28     16.20     0.49                             btAxisSweep3Internal<unsigned short>::quantize(unsigned short*, btVector3 const&, int) const
  1.86     16.60     0.40                             btAxisSweep3Internal<unsigned short>::sortMaxUp(int, unsigned short, btDispatcher*, bool)
  1.68     16.96     0.36                             btAxisSweep3Internal<unsigned short>::sortMaxDown(int, unsigned short, btDispatcher*, bool)
  1.49     17.28     0.32                             btAxisSweep3Internal<unsigned short>::sortMinUp(int, unsigned short, btDispatcher*, bool)
  1.49     17.60     0.32                             btOverlappingPairCache::removeOverlappingPair(btBroadphaseProxy*, btBroadphaseProxy*, btDispatcher*)
  1.35     17.89     0.29                             btConcaveShape::getMargin() const
  0.98     18.10     0.21                             btAxisSweep3Internal<unsigned short>::sortMinDown(int, unsigned short, btDispatcher*, bool)
  0.89     18.29     0.19                             btOptimizedBvh::unQuantize(unsigned short const*) const
  0.84     18.47     0.18                             btAxisSweep3Internal<unsigned short>::updateHandle(unsigned short, btVector3 const&, btVector3 const&, btDispatcher*)
  0.75     18.63     0.16                             __i686.get_pc_thunk.bx
So, the rayTest is just killing me. I've tested everything with a very small number of objects, and it all *works* as expected. But I've done raytests against this many objects in ODE without any performance issues. Looking at the implementation of rayTest, it does O(n) AABB tests, where n is the number of collision objects. This is too slow! Am I using bullet correctly? In ODE I could set it up to use an oct-tree to vastly reduce the number of ray/AABB tests. Does bullet have something similar? If not, I was thinking it wouldn't be too hard to write my own octTreeRayTest function that then goes off and calls btCollisionWorld::rayTestSingle where necessary.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA
Contact:

Re: btCollisionWorld::rayTest is slower than I'd expect

Post by Erwin Coumans »

That is a coincidence: the latest subversion has an improved raytest for triangle meshes. Can you check out the subversion repository? It will be available in Bullet 2.65.

We plan on further optimizing the raytest, so your help is welcome. One idea is to add a 'batch' raytest call.
27.68 15.08 5.94 btTriangleMeshShape::getAabb(btTransform const&, btVector3&, btVector3&) const
Can you provide the total number of calls to btTriangleMeshShape::getAabb? Can you try to merge most of the (1500) static triangle meshes into a few large btBvhTriangleMeshShapes?

The performance is likely to be due to ray versus triangle mesh, and not the broadphase (finding potential objects along the ray based on axis aligned bounding box).
If the broadphase is really the bottleneck (which is extremely unlikely), then instead of adding another acceleration structure, I prefer to re-use the existing btAxisSweep3 broadphase to speed things up. But as mentioned before, it is best to reduce the amount of individual static meshes by merging them.

Can you provide new timings, based on the latest SVN sources?
Hope this helps,
Erwin
venzon
Posts: 12
Joined: Mon Nov 26, 2007 2:42 am

Re: btCollisionWorld::rayTest is slower than I'd expect

Post by venzon »

With SVN R917:

Code: Select all

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 39.68      5.90     5.90                             btCollisionWorld::rayTest(btVector3 const&, btVector3 const&, btCollisionWorld::RayResultCallback&, short)
 27.98     10.06     4.16                             btTriangleMeshShape::getAabb(btTransform const&, btVector3&, btVector3&) const
  2.69     10.46     0.40                             btCollisionWorld::performDiscreteCollisionDetection()
  2.29     10.80     0.34                             btAxisSweep3Internal<unsigned short>::quantize(unsigned short*, btVector3 const&, int) const
  2.08     11.11     0.31                             btOverlappingPairCache::removeOverlappingPair(btBroadphaseProxy*, btBroadphaseProxy*, btDispatcher*)
  1.75     11.37     0.26                             btAxisSweep3Internal<unsigned short>::sortMinDown(int, unsigned short, btDispatcher*, bool)
  1.68     11.62     0.25                             btAxisSweep3Internal<unsigned short>::sortMinUp(int, unsigned short, btDispatcher*, bool)
  1.68     11.87     0.25                             btOptimizedBvh::unQuantize(unsigned short const*) const
  1.61     12.11     0.24                             btAxisSweep3Internal<unsigned short>::sortMaxUp(int, unsigned short, btDispatcher*, bool)
  1.14     12.28     0.17                             btAxisSweep3Internal<unsigned short>::sortMaxDown(int, unsigned short, btDispatcher*, bool)
  0.87     12.41     0.13                             btOptimizedBvh::buildTree(int, int)
  0.87     12.54     0.13                             btAxisSweep3Internal<unsigned short>::updateHandle(unsigned short, btVector3 const&, btVector3 const&, btDispatcher*)
FPS is still ~2. No big improvement. Unfortunately the profiler isn't able to determine the number of calls; I think it has to do with the way bullet was compiled. I've never used jam before, and I only know the basics of profiling (I'm using the GNU C++ compiler and gprof).

Based on how much time is being spent with in btTriangleMeshShape::getAabb, I still think the problem is the btCollisionWorld::rayTest brute-force method of checking for AABB intersections with every object. With ~1500 objects and doing 5 ray tests, that works out to 7500 iterations through the loop every frame... which seems like a lot!

I'll see if I can combine all of my 1500 small objects into one big bvhTriangleShape and will report back on the performance.

By the way, is there any chance that you could add a newline at the end of src/BulletCollision/BroadphaseCollision/btOverlappingPairCallback.h? My compiler complains about it. :-) It's easy to turn off the warning, but I thought I'd mention it.
venzon
Posts: 12
Joined: Mon Nov 26, 2007 2:42 am

Re: btCollisionWorld::rayTest is slower than I'd expect

Post by venzon »

Since I'm trying to integrate bullet into an already mostly complete game (vdrift.net), combining all of the static meshes into a much smaller number of objects would be difficult because the game relies on knowing which object was collided with. Instead, I coded up a simple AABB-based splitting plane tree and manually performed the broadphase with that. As a result only a few (~6 or so) objects get passed to btCollisionWorld::rayTestSingle. The performance is vastly improved, even with bullet 2.64 -- I now have playable framerates if I only do ray collisions. If I enable box collisions, now those are the bottleneck... doh! They aren't as slow as the ray tests were, though.
reltham
Posts: 66
Joined: Fri Oct 12, 2007 6:28 pm
Location: San Diego

Re: btCollisionWorld::rayTest is slower than I'd expect

Post by reltham »

I ran into the same performance issue with rayTest. I was able to get a big savings by having btCollisionObject cache it's Aabb, and have rayTest use that cached Aabb rather than calculate it from the btCollisionShape's Aabb and the btCollisionObject's transform. I had btCollisionObject update it's cached Aabb when setWorldTransform() was called on it.

This is probably not a good general solution, but it worked in my current case. The problem with my solution is that if the btCollisionShape changes (scale or ...) the btCollisionObject will have an invalid cached Aabb. It worked for me because my btCollisionShapes don't change during runtime.

In my profile runs prior to my change a full 2/3rds of the time was spent in the shape's getAabb(). So it made a huge difference for me.

Of course, now the biggest chunk of time is in rayAabb() testing the ray against the Aabb's. Which is done in a loop over all objects in btCollisionWorld. I hope there are plans to change this to use some kind of bvh.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA
Contact:

Re: btCollisionWorld::rayTest is slower than I'd expect

Post by Erwin Coumans »

combining all of the static meshes into a much smaller number of objects would be difficult because the game relies on knowing which object was collided with
Actually you can add each mesh as parts of the same btBvhTriangleMeshShape. During a collision or ray test callback, you will get both partId and triangleId, so you can know which object to collide with. So in general it is best to merge many small meshes into a few bigger ones, but your current solution sounds good too.
Of course, now the biggest chunk of time is in rayAabb() testing the ray against the Aabb's. Which is done in a loop over all objects in btCollisionWorld. I hope there are plans to change this to use some kind of bvh.
The AABBs for each objects are also stored in the broadphase, and we could add some ray versus sweep and prune. Alternatively, deriving a basic tree from the sweep and prune structure (from scratch each frame) is doable as future optimization.

So more optimizations are planned.
Thanks,
Erwin
alinpopescu
Posts: 1
Joined: Wed Aug 03, 2011 7:55 pm

Re: btCollisionWorld::rayTest is slower than I'd expect

Post by alinpopescu »

Hi,

I am using bullet and we have 8 static objects with about 8k triangles each. The objects represent an area of about 1 square Km. I perform a simple raytest with a ray of 150 m length and it takes about 1 ms to complete on a Core 2 Duo at 2 Ghz. Should this test take that long?

Thanks.
LouisD
Posts: 1
Joined: Mon Sep 11, 2017 2:57 pm

Re: btCollisionWorld::rayTest is slower than I'd expect

Post by LouisD »

I'm also having the same kind of performance issues
I'm doing occlusion culling using rayTest in Unity (BulletUnity plugin from the asset store), with very complex structures, containing several millions objects. Depending on the vew axis of the camera, my rayTests can take between a few tenths of a second and a few seconds, which is far too long for what i'm doing.

all my collisionObjects are created with a GImpactMeshShape, and I'm using only a ClosestRayResultCallBack.

Is the complexity of rayTest still O(n) ? in that case so many objects are touched by the ray so it's normal that it takes so much time.

I'am using 2.83 version btw
Post Reply