Making bullet faster.

GilsonLaurent · Post by **GilsonLaurent** » Mon Jun 16, 2008 4:34 pm

Hi,

i'm evaluating several engines for a bigger project. It's about simulating a large amount of little convex trimesh-objects colliding. I do not except real-time.

But bullet is way slower then any other engine. i think i missed something, triggered a debug mode or forgot to initialise some speedups. It compiles into release mode. This is the initialisation:

Code: Select all

    int maxProxies = 32766;
    btDefaultCollisionConfiguration* m_collisionConfiguration = new btDefaultCollisionConfiguration();
    
    btCollisionDispatcher* m_dispatcher = new   btCollisionDispatcher(m_collisionConfiguration);
    
    btVector3 worldAabbMin(-50, -50, -50);
    btVector3 worldAabbMax(50, 50, 50);
    btAxisSweep3* m_overlappingPairCache = new btAxisSweep3(worldAabbMin, worldAabbMax, maxProxies);
    
    btSequentialImpulseConstraintSolver* sol = new btSequentialImpulseConstraintSolver;
    
    world = new btDiscreteDynamicsWorld(m_dispatcher, m_overlappingPairCache, sol, m_collisionConfiguration);
    
    world->setGravity(btVector3(0, -9.81, 0));

and this generates a object:

Code: Select all

    float mass = 1.f;
    btTransform startTransform;
    startTransform.setIdentity();
    startTransform.setOrigin(btVector3(pos_x, pos_y, pos_z));
    
    btTriangleMesh* mesh = new btTriangleMesh();
    
    float* tri = m_type->getVertex(); //die Dreiecke
    int number_of_tris = m_type->getSize(); //Anzahl der Dreiecke
    for (int i =0;i<number_of_tris;i++){
        mesh->addTriangle(btVector3(tri[i*9], tri[(i*9)+1], tri[(i*9)+2]),
                btVector3(tri[(i*9)+3], tri[(i*9)+4], tri[(i*9)+5]),
                btVector3(tri[(i*9)+6], tri[(i*9)+7], tri[(i*9)+8]));
    }
    btCollisionShape* shape = new btConvexTriangleMeshShape(mesh);
    
    btVector3 localInertia(0, 0, 0);
    localInertia[0] = m_type->inertia[0][0];
    localInertia[1] = m_type->inertia[1][1];
    localInertia[2] = m_type->inertia[2][2];
    
    btDefaultMotionState* myMotionState = new btDefaultMotionState(startTransform);
    m_body = new btRigidBody(btRigidBody::btRigidBodyConstructionInfo(mass, myMotionState, shape, localInertia));

I'm on linux and version 2.67. It cannot be the mising pThreads. I see ~30s / step for bullet vs 0.1-0.2s / step for the other candidates.

Any ideas ?

Thanks

sparkprime · Post by **sparkprime** » Mon Jun 16, 2008 5:13 pm

This may be a silly question but what do you consider one step to be? Are you letting bullet run steps internally or explicitly doing each step yourself.

If you ask for 1 second and bullet does 60 steps, then that may explain the large difference in time.

GilsonLaurent · Post by **GilsonLaurent** » Mon Jun 16, 2008 5:34 pm

one step is 0.01s run by

Code: Select all

world->stepSimulation((btScalar)0.01,0);

I hope that makes one step of 10ms ?

chunky · Post by **chunky** » Mon Jun 16, 2008 7:48 pm

maxSubSteps = 0 is not your friend.

Your stepSimulation would be better posted as stepSimulation((btScalar)0.01, 1, (btScalar)0.01);

I don't know if that fixes your specific case, but it's something to be aware of.

Hm. Question for someone else, though: How *would* you do single steps of the simulation as described above? It seems like you're likely to run into issues of float accuracy in the time accumulator leading [over a long enough period] to individual extra steps.

Gary (-;

sparkprime · Post by **sparkprime** » Mon Jun 16, 2008 7:52 pm

Try a variety of stepsimulation stuff, and also get some profiling done.

On linux, I recommend "google performance tools" and/or "valgrind + callgrind", neither of which require invasive recompilation. edit: although debug symbols would obviously be required.

GilsonLaurent · Post by **GilsonLaurent** » Wed Jun 18, 2008 5:00 pm

The different stepSimulation calls make little to no difference. callgrind shows most time is pasting in getLocalSupportingVertex. That makes sense, my models are very complex convex hulls (switched from convex triangle meshs to convex hulls. It helped a bit) (1300 vertexes per hull. Don't ask, that number is not going to change).

Is it faster to split the convex hulls into smaller subparts and use compound shapes? Does the compound shape prevent getLocalSupportingVertex-calls if the subshape is out of reach?

Thanks

sparkprime · Post by **sparkprime** » Wed Jun 18, 2008 5:23 pm

do you mean localGetSupportingVertex ?

sparkprime · Post by **sparkprime** » Wed Jun 18, 2008 5:30 pm

Also I forgot to mention kcachegrind is a nice tool for visualising the callgrind output, if you haven't already found it.

Callgrind's "time" is not real cpu time but just the number of instructions executed I think, so the results may be skewed a bit. Google performance tools is a bit nicer in this respect, as it's run on the real CPU rather than a simulation. I usually use both but I prefer callgrind mainly because of kcachegrind

Real-Time Physics Simulation Forum

Making bullet faster.

Making bullet faster.

Re: Making bullet faster.

Re: Making bullet faster.

Re: Making bullet faster.

Re: Making bullet faster.

Re: Making bullet faster.

Re: Making bullet faster.

Re: Making bullet faster.