Performance with entirely static scene
-
- Posts: 17
- Joined: Fri Jan 08, 2010 11:40 am
Performance with entirely static scene
Hi
I am currently doing some profiling and am surprised how much time Bullet takes.
My scene is entirely static. That means 1 huge triangle mesh (world / buildings), about 5 smaller triangle meshes (special buildings) and a few thousand capsules (other entities).
I mainly use Bullet for ray-casting (for AI purposes). Therefore i don't have a single rigid-body in the scene. The only dynamic object i use is a kinematic character controller for camera movement.
When i profile Bullets update step, it always takes at least 5 milliseconds, but regularly jumps to 10 milliseconds and rarely spikes up to 20 milliseconds.
Now this is all still in the real-time range, but i am very surprised to see such high numbers. I would expect that Bullet knows that there is only a single dynamic object (which is kinematic too) and does not need to do much updating at all.
So my question is this: Does Bullet really take that long? If yes, what the heck is it doing? If (usually) no, what might i be doing wrong? Could it be that i use some component, e.g. for the broad-phase, where Bullet iterates over all objects in every update, even the static ones?
Any hints are greatly appreciated.
Jan.
I am currently doing some profiling and am surprised how much time Bullet takes.
My scene is entirely static. That means 1 huge triangle mesh (world / buildings), about 5 smaller triangle meshes (special buildings) and a few thousand capsules (other entities).
I mainly use Bullet for ray-casting (for AI purposes). Therefore i don't have a single rigid-body in the scene. The only dynamic object i use is a kinematic character controller for camera movement.
When i profile Bullets update step, it always takes at least 5 milliseconds, but regularly jumps to 10 milliseconds and rarely spikes up to 20 milliseconds.
Now this is all still in the real-time range, but i am very surprised to see such high numbers. I would expect that Bullet knows that there is only a single dynamic object (which is kinematic too) and does not need to do much updating at all.
So my question is this: Does Bullet really take that long? If yes, what the heck is it doing? If (usually) no, what might i be doing wrong? Could it be that i use some component, e.g. for the broad-phase, where Bullet iterates over all objects in every update, even the static ones?
Any hints are greatly appreciated.
Jan.
-
- Posts: 508
- Joined: Fri May 30, 2008 2:51 am
- Location: Ossining, New York
Re: Performance with entirely static scene
I've got a scene with a lot of static data and a small number of dynamic rigid bodies and I haven't had this problem. You should use Bullet's internal profiling feature or use an external profiler to get some data.
-
- Posts: 229
- Joined: Sun Sep 30, 2007 7:58 am
Re: Performance with entirely static scene
This should solve your problem:
bltWorld->setForceUpdateAllAabbs(false);
Without that switch Bullet will iterate each frame over all bodies aabb.
Sometime back this switch was by default FALSE but that broke some backward compatibility. So Erwin set it to TRUE by default.
bltWorld->setForceUpdateAllAabbs(false);
Without that switch Bullet will iterate each frame over all bodies aabb.
Sometime back this switch was by default FALSE but that broke some backward compatibility. So Erwin set it to TRUE by default.
-
- Posts: 17
- Joined: Fri Jan 08, 2010 11:40 am
Re: Performance with entirely static scene
Great thanks!
That switch indeed dropped the time taken to 2-3 ms per update. Still quite high for "nothing to be done" IMO, but considerably better than before.
Jan.
That switch indeed dropped the time taken to 2-3 ms per update. Still quite high for "nothing to be done" IMO, but considerably better than before.
Jan.
-
- Posts: 17
- Joined: Fri Jan 08, 2010 11:40 am
Re: Performance with entirely static scene
Some additional tests show that Bullet clearly takes linearly more time with the amount of objects that i add.
I add several thousand static boxes and cylinders. Without those Bullet takes ~10 microseconds (still 5 huge static meshes are loaded).
So it still seems to be updating something for every object at every update. I would like to get rid of that too. Any other switches i could be missing? I found "setSynchronizeAllMotionStates", but that seems to be off by default already.
Thanks,
Jan.
I add several thousand static boxes and cylinders. Without those Bullet takes ~10 microseconds (still 5 huge static meshes are loaded).
So it still seems to be updating something for every object at every update. I would like to get rid of that too. Any other switches i could be missing? I found "setSynchronizeAllMotionStates", but that seems to be off by default already.
Thanks,
Jan.
-
- Posts: 229
- Joined: Sun Sep 30, 2007 7:58 am
Re: Performance with entirely static scene
Hi Jan,JanK wrote:Some additional tests show that Bullet clearly takes linearly more time with the amount of objects that i add.
I add several thousand static boxes and cylinders. Without those Bullet takes ~10 microseconds (still 5 huge static meshes are loaded).
So it still seems to be updating something for every object at every update. I would like to get rid of that too. Any other switches i could be missing? I found "setSynchronizeAllMotionStates", but that seems to be off by default already.
Thanks,
Jan.
maybe you should simply step through the Step() function. Just try with only a few static bodies and so you could spot the problem easy.
-
- Site Admin
- Posts: 4221
- Joined: Sun Jun 26, 2005 6:43 pm
- Location: California, USA
Re: Performance with entirely static scene
Can you provide us with details timings, using the following line right after 'stepSimulation'?
Thanks,
Erwin
Code: Select all
CProfileManager::dumpAll();
Erwin
-
- Posts: 17
- Joined: Fri Jan 08, 2010 11:40 am
Re: Performance with entirely static scene
----------------------------------
Profiling: Root (total running time: 1.333 ms) ---
0 -- stepSimulation (99.85 %) :: 1.331 ms / frame (1 calls)
Unaccounted: (0.150 %) :: 0.002 ms
...----------------------------------
...Profiling: stepSimulation (total running time: 1.331 ms) ---
...0 -- synchronizeMotionStates (0.00 %) :: 0.000 ms / frame (1 calls)
...1 -- internalSingleStepSimulation (79.56 %) :: 1.059 ms / frame (1 calls)
...Unaccounted: (20.436 %) :: 0.272 ms
......----------------------------------
......Profiling: internalSingleStepSimulation (total running time: 1.059 ms) ---
......0 -- updateActivationState (0.00 %) :: 0.000 ms / frame (1 calls)
......1 -- updateActions (1.04 %) :: 0.011 ms / frame (1 calls)
......2 -- integrateTransforms (0.00 %) :: 0.000 ms / frame (1 calls)
......3 -- solveConstraints (41.64 %) :: 0.441 ms / frame (1 calls)
......4 -- calculateSimulationIslands (39.19 %) :: 0.415 ms / frame (1 calls)
......5 -- performDiscreteCollisionDetection (17.94 %) :: 0.190 ms / frame (1 calls)
......6 -- predictUnconstraintMotion (0.00 %) :: 0.000 ms / frame (1 calls)
......Unaccounted: (0.189 %) :: 0.002 ms
.........----------------------------------
.........Profiling: solveConstraints (total running time: 0.441 ms) ---
.........0 -- solveGroup (0.68 %) :: 0.003 ms / frame (1 calls)
.........1 -- processIslands (30.16 %) :: 0.133 ms / frame (1 calls)
.........2 -- islandUnionFindAndQuickSort (68.93 %) :: 0.304 ms / frame (1 calls)
.........Unaccounted: (0.227 %) :: 0.001 ms
............----------------------------------
............Profiling: solveGroup (total running time: 0.003 ms) ---
............0 -- solveGroupCacheFriendlyIterations (33.33 %) :: 0.001 ms / frame (1 calls)
............1 -- solveGroupCacheFriendlySetup (33.33 %) :: 0.001 ms / frame (1 calls)
............Unaccounted: (33.333 %) :: 0.001 ms
.........----------------------------------
.........Profiling: performDiscreteCollisionDetection (total running time: 0.190 ms) ---
.........0 -- dispatchAllCollisionPairs (2.11 %) :: 0.004 ms / frame (1 calls)
.........1 -- calculateOverlappingPairs (78.42 %) :: 0.149 ms / frame (1 calls)
.........2 -- updateAabbs (19.47 %) :: 0.037 ms / frame (1 calls)
.........Unaccounted: (0.000 %) :: 0.000 ms
Just to mention it once again: It all runs in real-time, i'm not unhappy with the performance. I am simply surprised, that it seems to be doing so much work, at all, although only a single kinematic object exists (and about 30000 static ones).
Jan.
Profiling: Root (total running time: 1.333 ms) ---
0 -- stepSimulation (99.85 %) :: 1.331 ms / frame (1 calls)
Unaccounted: (0.150 %) :: 0.002 ms
...----------------------------------
...Profiling: stepSimulation (total running time: 1.331 ms) ---
...0 -- synchronizeMotionStates (0.00 %) :: 0.000 ms / frame (1 calls)
...1 -- internalSingleStepSimulation (79.56 %) :: 1.059 ms / frame (1 calls)
...Unaccounted: (20.436 %) :: 0.272 ms
......----------------------------------
......Profiling: internalSingleStepSimulation (total running time: 1.059 ms) ---
......0 -- updateActivationState (0.00 %) :: 0.000 ms / frame (1 calls)
......1 -- updateActions (1.04 %) :: 0.011 ms / frame (1 calls)
......2 -- integrateTransforms (0.00 %) :: 0.000 ms / frame (1 calls)
......3 -- solveConstraints (41.64 %) :: 0.441 ms / frame (1 calls)
......4 -- calculateSimulationIslands (39.19 %) :: 0.415 ms / frame (1 calls)
......5 -- performDiscreteCollisionDetection (17.94 %) :: 0.190 ms / frame (1 calls)
......6 -- predictUnconstraintMotion (0.00 %) :: 0.000 ms / frame (1 calls)
......Unaccounted: (0.189 %) :: 0.002 ms
.........----------------------------------
.........Profiling: solveConstraints (total running time: 0.441 ms) ---
.........0 -- solveGroup (0.68 %) :: 0.003 ms / frame (1 calls)
.........1 -- processIslands (30.16 %) :: 0.133 ms / frame (1 calls)
.........2 -- islandUnionFindAndQuickSort (68.93 %) :: 0.304 ms / frame (1 calls)
.........Unaccounted: (0.227 %) :: 0.001 ms
............----------------------------------
............Profiling: solveGroup (total running time: 0.003 ms) ---
............0 -- solveGroupCacheFriendlyIterations (33.33 %) :: 0.001 ms / frame (1 calls)
............1 -- solveGroupCacheFriendlySetup (33.33 %) :: 0.001 ms / frame (1 calls)
............Unaccounted: (33.333 %) :: 0.001 ms
.........----------------------------------
.........Profiling: performDiscreteCollisionDetection (total running time: 0.190 ms) ---
.........0 -- dispatchAllCollisionPairs (2.11 %) :: 0.004 ms / frame (1 calls)
.........1 -- calculateOverlappingPairs (78.42 %) :: 0.149 ms / frame (1 calls)
.........2 -- updateAabbs (19.47 %) :: 0.037 ms / frame (1 calls)
.........Unaccounted: (0.000 %) :: 0.000 ms
Just to mention it once again: It all runs in real-time, i'm not unhappy with the performance. I am simply surprised, that it seems to be doing so much work, at all, although only a single kinematic object exists (and about 30000 static ones).
Jan.
-
- Site Admin
- Posts: 4221
- Joined: Sun Jun 26, 2005 6:43 pm
- Location: California, USA
Re: Performance with entirely static scene
Ah, so the total cost is just above 1 millisecond. You don't have a frame that takes > 10ms (as you mention earlier)?
Thanks,
Erwin
It seems that the island generation/union find became the bottleneck for many static objects (>0.7ms). Perhaps you can share and attach a .bullet file, so we can look at optimizing it?......4 -- calculateSimulationIslands (39.19 %) :: 0.415 ms / frame (1 calls)
.........2 -- islandUnionFindAndQuickSort (68.93 %) :: 0.304 ms / frame (1 calls)
Thanks,
Erwin
-
- Posts: 17
- Joined: Fri Jan 08, 2010 11:40 am
Re: Performance with entirely static scene
My profiling data is acquired across several seconds and there i saw a jump above 10ms every once in a while, but that was BEFORE i disabled the AABB updates (which decreased the time taken considerably).
Now this dump is only from a single frame, but from my other profiling data 1.2 ms seems to be the average at the moment. It does spike to 1.5 ms every once in a while, but that should be negligible.
If you tell me how i can quickly generate a .bullet file, i can give that data to you.
Thanks,
Jan.
Now this dump is only from a single frame, but from my other profiling data 1.2 ms seems to be the average at the moment. It does spike to 1.5 ms every once in a while, but that should be negligible.
If you tell me how i can quickly generate a .bullet file, i can give that data to you.
Thanks,
Jan.
-
- Site Admin
- Posts: 4221
- Joined: Sun Jun 26, 2005 6:43 pm
- Location: California, USA
Re: Performance with entirely static scene
The link in my previous posting has the details, I'll copy it for your convenience:
And after you created the world and all objects:
Thanks,
Erwin
Code: Select all
#include "LinearMath/btSerializer.h"
Code: Select all
//create a large enough buffer. There is no method to pre-calculate the buffer size yet.
int maxSerializeBufferSize = 1024*1024*5;
btDefaultSerializer* serializer = new btDefaultSerializer(maxSerializeBufferSize);
dynamicsWorld->serialize(serializer);
FILE* file = fopen("testFile.bullet","wb");
fwrite(serializer->getBufferPointer(),serializer->getCurrentBufferSize(),1, file);
fclose(file);
Erwin
-
- Posts: 17
- Joined: Fri Jan 08, 2010 11:40 am
Re: Performance with entirely static scene
Sorry, i didn't see that you put a link in the post.
Anyways... you can get the data here: *admin removed url after downloading it *. It's quite big, so please don't download it too often. I'll take it down in a few hours again.
Jan.
Anyways... you can get the data here: *admin removed url after downloading it *. It's quite big, so please don't download it too often. I'll take it down in a few hours again.
Jan.
-
- Posts: 17
- Joined: Fri Jan 08, 2010 11:40 am
Re: Performance with entirely static scene
Argh, during my profiling i had disabled some entities (which is why it apparently got faster), so in the data about 6000 boxes are missing. But it for your purposes it should still be valid.
Jan.
Jan.
-
- Posts: 5
- Joined: Wed Mar 17, 2010 4:09 pm
Re: Performance with entirely static scene
Hi everyone,
We had the same issue with our collision world containing a lot of Static and Kinematic Objects (actually around 3000 and still growing).
I've tried to remove all Static and Kinematic Objects from the calculateSimulationIslands process and it seemed to work pretty well.
Only two sources need modifications :
in btSimulationIslandManager.cpp :
and
and, to avoid incorrect index patching in buildIslands
in btUnionFind.cpp :
Here are the performance gain we had on a PPU PS3 version with our world containing only static and kinematic objects
Before :
After :
There is a signnificant gain in solveConstraints and performDiscreteCollisionDetection (~4ms).
I've tried almost all Samples without noticing any difference in behavior.
Hope this could help.
Boris.
We had the same issue with our collision world containing a lot of Static and Kinematic Objects (actually around 3000 and still growing).
I've tried to remove all Static and Kinematic Objects from the calculateSimulationIslands process and it seemed to work pretty well.
Only two sources need modifications :
in btSimulationIslandManager.cpp :
Code: Select all
void btSimulationIslandManager::updateActivationState(btCollisionWorld* colWorld,btDispatcher* dispatcher)
{
// put the index into m_controllers into m_tag
int index = 0;
{
int i;
for (i=0;i<colWorld->getCollisionObjectArray().size(); i++)
{
btCollisionObject* collisionObject= colWorld->getCollisionObjectArray()[i];
//Adding filtering here
if (!collisionObject->isStaticOrKinematicObject())
{
collisionObject->setIslandTag(index++);
}
collisionObject->setCompanionId(-1);
collisionObject->setHitFraction(btScalar(1.));
}
}
// do the union find
initUnionFind( index );
findUnions(dispatcher,colWorld);
}
Code: Select all
void btSimulationIslandManager::storeIslandActivationState(btCollisionWorld* colWorld)
{
// put the islandId ('find' value) into m_tag
{
int index = 0;
int i;
for (i=0;i<colWorld->getCollisionObjectArray().size();i++)
{
btCollisionObject* collisionObject= colWorld->getCollisionObjectArray()[i];
if (!collisionObject->isStaticOrKinematicObject())
{
collisionObject->setIslandTag( m_unionFind.find(index) );
//Set the correct object offset in Collision Object Array
m_unionFind.getElement(index).m_sz = i;
collisionObject->setCompanionId(-1);
index++;
} else
{
collisionObject->setIslandTag(-1);
collisionObject->setCompanionId(-2);
}
}
}
}
in btUnionFind.cpp :
Code: Select all
void btUnionFind::sortIslands()
{
//first store the original body index, and islandId
int numElements = m_elements.size();
for (int i=0;i<numElements;i++)
{
m_elements[i].m_id = find(i);
//Only Remove this m_elements[i].m_sz = i;
}
// Sort the vector using predicate and std::sort
//std::sort(m_elements.begin(), m_elements.end(), btUnionFindElementSortPredicate);
m_elements.quickSort(btUnionFindElementSortPredicate());
}
Before :
Code: Select all
info : ----------------------------------
info : Profiling: Root (total running time: 14.447 ms) ---
info : 0 -- convexSweepTest (0.00 %) :: 0.000 ms / frame (0 calls)
info : 1 -- stepSimulation (60.01 %) :: 8.670 ms / frame (1 calls)
info : 2 -- rayTest (0.00 %) :: 0.000 ms / frame (0 calls)
Unaccounted: (39.988 %) :: 5.777 ms
info : ----------------------------------
info : Profiling: stepSimulation (total running time: 8.670 ms) ---
info : 0 -- synchronizeMotionStates (0.03 %) :: 0.003 ms / frame (3 calls)
info : 1 -- updateSoftBodies (0.03 %) :: 0.003 ms / frame (3 calls)
info : 2 -- solveSoftConstraints (0.05 %) :: 0.004 ms / frame (3 calls)
info : 3 -- internalSingleStepSimulation (95.54 %) :: 8.283 ms / frame (3 calls)
Unaccounted: (4.348 %) :: 0.377 ms
info : ----------------------------------
info : Profiling: internalSingleStepSimulation (total running time: 8.283 ms) ---
info : 0 -- updateActivationState (0.05 %) :: 0.004 ms / frame (3 calls)
info : 1 -- updateActions (0.04 %) :: 0.003 ms / frame (3 calls)
info : 2 -- integrateTransforms (0.04 %) :: 0.003 ms / frame (3 calls)
info : 3 -- solveConstraints (41.05 %) :: 3.400 ms / frame (3 calls)
info : 4 -- calculateSimulationIslands (22.76 %) :: 1.885 ms / frame (3 calls)
info : 5 -- performDiscreteCollisionDetection (35.54 %) :: 2.944 ms / frame (3 calls)
info : 6 -- predictUnconstraintMotion (0.04 %) :: 0.003 ms / frame (3 calls)
Unaccounted: (0.495 %) :: 0.041 ms
info : ----------------------------------
info : Profiling: solveConstraints (total running time: 3.400 ms) ---
info : 0 -- processIslands (42.97 %) :: 1.461 ms / frame (3 calls)
info : 1 -- islandUnionFindAndQuickSort (56.38 %) :: 1.917 ms / frame (3 calls)
Unaccounted: (0.647 %) :: 0.022 ms
info : ----------------------------------
info : Profiling: performDiscreteCollisionDetection (total running time: 2.944 ms) ---
info : 0 -- dispatchAllCollisionPairs (70.21 %) :: 2.067 ms / frame (3 calls)
info : 1 -- calculateOverlappingPairs (0.20 %) :: 0.006 ms / frame (3 calls)
info : 2 -- updateAabbs (29.08 %) :: 0.856 ms / frame (3 calls)
Unaccounted: (0.510 %) :: 0.015 ms
Code: Select all
info : ----------------------------------
info : Profiling: Root (total running time: 10.644 ms) ---
info : 0 -- convexSweepTest (0.00 %) :: 0.000 ms / frame (0 calls)
info : 1 -- stepSimulation (38.89 %) :: 4.139 ms / frame (1 calls)
info : 2 -- rayTest (0.00 %) :: 0.000 ms / frame (0 calls)
Unaccounted: (61.114 %) :: 6.505 ms
info : ----------------------------------
info : Profiling: stepSimulation (total running time: 4.139 ms) ---
info : 0 -- synchronizeMotionStates (0.05 %) :: 0.002 ms / frame (3 calls)
info : 1 -- updateSoftBodies (0.05 %) :: 0.002 ms / frame (3 calls)
info : 2 -- solveSoftConstraints (0.02 %) :: 0.001 ms / frame (3 calls)
info : 3 -- internalSingleStepSimulation (90.79 %) :: 3.758 ms / frame (3 calls)
Unaccounted: (9.084 %) :: 0.376 ms
info : ----------------------------------
info : Profiling: internalSingleStepSimulation (total running time: 3.758 ms) ---
info : 0 -- updateActivationState (0.11 %) :: 0.004 ms / frame (3 calls)
info : 1 -- updateActions (0.05 %) :: 0.002 ms / frame (3 calls)
info : 2 -- integrateTransforms (0.05 %) :: 0.002 ms / frame (3 calls)
info : 3 -- solveConstraints (0.45 %) :: 0.017 ms / frame (3 calls)
info : 4 -- calculateSimulationIslands (48.99 %) :: 1.841 ms / frame (3 calls)
info : 5 -- performDiscreteCollisionDetection (49.60 %) :: 1.864 ms / frame (3 calls)
info : 6 -- predictUnconstraintMotion (0.03 %) :: 0.001 ms / frame (3 calls)
Unaccounted: (0.718 %) :: 0.027 ms
info : ----------------------------------
info : Profiling: solveConstraints (total running time: 0.017 ms) ---
info : 0 -- processIslands (11.76 %) :: 0.002 ms / frame (3 calls)
info : 1 -- islandUnionFindAndQuickSort (11.76 %) :: 0.002 ms / frame (3 calls)
Unaccounted: (76.471 %) :: 0.013 ms
info : ----------------------------------
info : Profiling: performDiscreteCollisionDetection (total running time: 1.864 ms) ---
info : 0 -- dispatchAllCollisionPairs (75.43 %) :: 1.406 ms / frame (3 calls)
info : 1 -- calculateOverlappingPairs (0.27 %) :: 0.005 ms / frame (3 calls)
info : 2 -- updateAabbs (23.82 %) :: 0.444 ms / frame (3 calls)
Unaccounted: (0.483 %) :: 0.009 ms
I've tried almost all Samples without noticing any difference in behavior.
Hope this could help.
Boris.