btCollisionWorld::rayTest is slower than I'd expect
Posted: Sat Dec 01, 2007 4:42 pm
First off, let me say that bullet seems to be extremely well-written. I like the way the API works, and I haven't encountered any bugs or inconsistent behavior so far. I prefer it over ODE. The bullet documentation could use some work, but the large assortment of demos helps.
I'm using bullet 2.64 in C++ for collision detection only. So, I use bullet similarly to the CollisionInterfaceDemo. I have a large number (~1500) of static trimesh collision objects (btBvhTriangleMeshShape), and every frame I need to collide those objects with a box and with some (~5) rays. For the box collision I use btCollisionWorld::performDiscreteCollisionDetection() and then look at the manifolds. This seems to perform pretty well. For the ray collision, I use btCollisionWorld::rayTest. Unfortunately, rayTest is veeerrry slow. Here's what a flat profile of my application looks like:
So, the rayTest is just killing me. I've tested everything with a very small number of objects, and it all *works* as expected. But I've done raytests against this many objects in ODE without any performance issues. Looking at the implementation of rayTest, it does O(n) AABB tests, where n is the number of collision objects. This is too slow! Am I using bullet correctly? In ODE I could set it up to use an oct-tree to vastly reduce the number of ray/AABB tests. Does bullet have something similar? If not, I was thinking it wouldn't be too hard to write my own octTreeRayTest function that then goes off and calls btCollisionWorld::rayTestSingle where necessary.
I'm using bullet 2.64 in C++ for collision detection only. So, I use bullet similarly to the CollisionInterfaceDemo. I have a large number (~1500) of static trimesh collision objects (btBvhTriangleMeshShape), and every frame I need to collide those objects with a box and with some (~5) rays. For the box collision I use btCollisionWorld::performDiscreteCollisionDetection() and then look at the manifolds. This seems to perform pretty well. For the ray collision, I use btCollisionWorld::rayTest. Unfortunately, rayTest is veeerrry slow. Here's what a flat profile of my application looks like:
Code: Select all
% cumulative self self total
time seconds seconds calls s/call s/call name
42.59 9.14 9.14 btCollisionWorld::rayTest(btVector3 const&, btVector3 const&, btCollisionWorld::RayResultCallback&, short)
27.68 15.08 5.94 btTriangleMeshShape::getAabb(btTransform const&, btVector3&, btVector3&) const
2.94 15.71 0.63 btCollisionWorld::performDiscreteCollisionDetection()
2.28 16.20 0.49 btAxisSweep3Internal<unsigned short>::quantize(unsigned short*, btVector3 const&, int) const
1.86 16.60 0.40 btAxisSweep3Internal<unsigned short>::sortMaxUp(int, unsigned short, btDispatcher*, bool)
1.68 16.96 0.36 btAxisSweep3Internal<unsigned short>::sortMaxDown(int, unsigned short, btDispatcher*, bool)
1.49 17.28 0.32 btAxisSweep3Internal<unsigned short>::sortMinUp(int, unsigned short, btDispatcher*, bool)
1.49 17.60 0.32 btOverlappingPairCache::removeOverlappingPair(btBroadphaseProxy*, btBroadphaseProxy*, btDispatcher*)
1.35 17.89 0.29 btConcaveShape::getMargin() const
0.98 18.10 0.21 btAxisSweep3Internal<unsigned short>::sortMinDown(int, unsigned short, btDispatcher*, bool)
0.89 18.29 0.19 btOptimizedBvh::unQuantize(unsigned short const*) const
0.84 18.47 0.18 btAxisSweep3Internal<unsigned short>::updateHandle(unsigned short, btVector3 const&, btVector3 const&, btDispatcher*)
0.75 18.63 0.16 __i686.get_pc_thunk.bx