Performance with entirely static scene

JanK
Posts: 17
Joined: Fri Jan 08, 2010 11:40 am

Performance with entirely static scene

Post by JanK »

Hi

I am currently doing some profiling and am surprised how much time Bullet takes.

My scene is entirely static. That means 1 huge triangle mesh (world / buildings), about 5 smaller triangle meshes (special buildings) and a few thousand capsules (other entities).

I mainly use Bullet for ray-casting (for AI purposes). Therefore i don't have a single rigid-body in the scene. The only dynamic object i use is a kinematic character controller for camera movement.

When i profile Bullets update step, it always takes at least 5 milliseconds, but regularly jumps to 10 milliseconds and rarely spikes up to 20 milliseconds.

Now this is all still in the real-time range, but i am very surprised to see such high numbers. I would expect that Bullet knows that there is only a single dynamic object (which is kinematic too) and does not need to do much updating at all.

So my question is this: Does Bullet really take that long? If yes, what the heck is it doing? If (usually) no, what might i be doing wrong? Could it be that i use some component, e.g. for the broad-phase, where Bullet iterates over all objects in every update, even the static ones?

Any hints are greatly appreciated.
Jan.
sparkprime
Posts: 508
Joined: Fri May 30, 2008 2:51 am
Location: Ossining, New York

Re: Performance with entirely static scene

Post by sparkprime »

I've got a scene with a lot of static data and a small number of dynamic rigid bodies and I haven't had this problem. You should use Bullet's internal profiling feature or use an external profiler to get some data.
pico
Posts: 229
Joined: Sun Sep 30, 2007 7:58 am

Re: Performance with entirely static scene

Post by pico »

This should solve your problem:
bltWorld->setForceUpdateAllAabbs(false);

Without that switch Bullet will iterate each frame over all bodies aabb.
Sometime back this switch was by default FALSE but that broke some backward compatibility. So Erwin set it to TRUE by default.
JanK
Posts: 17
Joined: Fri Jan 08, 2010 11:40 am

Re: Performance with entirely static scene

Post by JanK »

Great thanks!

That switch indeed dropped the time taken to 2-3 ms per update. Still quite high for "nothing to be done" IMO, but considerably better than before.

Jan.
JanK
Posts: 17
Joined: Fri Jan 08, 2010 11:40 am

Re: Performance with entirely static scene

Post by JanK »

Some additional tests show that Bullet clearly takes linearly more time with the amount of objects that i add.

I add several thousand static boxes and cylinders. Without those Bullet takes ~10 microseconds (still 5 huge static meshes are loaded).

So it still seems to be updating something for every object at every update. I would like to get rid of that too. Any other switches i could be missing? I found "setSynchronizeAllMotionStates", but that seems to be off by default already.

Thanks,
Jan.
pico
Posts: 229
Joined: Sun Sep 30, 2007 7:58 am

Re: Performance with entirely static scene

Post by pico »

JanK wrote:Some additional tests show that Bullet clearly takes linearly more time with the amount of objects that i add.

I add several thousand static boxes and cylinders. Without those Bullet takes ~10 microseconds (still 5 huge static meshes are loaded).

So it still seems to be updating something for every object at every update. I would like to get rid of that too. Any other switches i could be missing? I found "setSynchronizeAllMotionStates", but that seems to be off by default already.

Thanks,
Jan.
Hi Jan,

maybe you should simply step through the Step() function. Just try with only a few static bodies and so you could spot the problem easy.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Performance with entirely static scene

Post by Erwin Coumans »

Can you provide us with details timings, using the following line right after 'stepSimulation'?

Code: Select all

CProfileManager::dumpAll();
Thanks,
Erwin
JanK
Posts: 17
Joined: Fri Jan 08, 2010 11:40 am

Re: Performance with entirely static scene

Post by JanK »

----------------------------------
Profiling: Root (total running time: 1.333 ms) ---
0 -- stepSimulation (99.85 %) :: 1.331 ms / frame (1 calls)
Unaccounted: (0.150 %) :: 0.002 ms
...----------------------------------
...Profiling: stepSimulation (total running time: 1.331 ms) ---
...0 -- synchronizeMotionStates (0.00 %) :: 0.000 ms / frame (1 calls)
...1 -- internalSingleStepSimulation (79.56 %) :: 1.059 ms / frame (1 calls)
...Unaccounted: (20.436 %) :: 0.272 ms
......----------------------------------
......Profiling: internalSingleStepSimulation (total running time: 1.059 ms) ---

......0 -- updateActivationState (0.00 %) :: 0.000 ms / frame (1 calls)
......1 -- updateActions (1.04 %) :: 0.011 ms / frame (1 calls)
......2 -- integrateTransforms (0.00 %) :: 0.000 ms / frame (1 calls)
......3 -- solveConstraints (41.64 %) :: 0.441 ms / frame (1 calls)
......4 -- calculateSimulationIslands (39.19 %) :: 0.415 ms / frame (1 calls)
......5 -- performDiscreteCollisionDetection (17.94 %) :: 0.190 ms / frame (1 calls)
......6 -- predictUnconstraintMotion (0.00 %) :: 0.000 ms / frame (1 calls)
......Unaccounted: (0.189 %) :: 0.002 ms
.........----------------------------------
.........Profiling: solveConstraints (total running time: 0.441 ms) ---
.........0 -- solveGroup (0.68 %) :: 0.003 ms / frame (1 calls)
.........1 -- processIslands (30.16 %) :: 0.133 ms / frame (1 calls)
.........2 -- islandUnionFindAndQuickSort (68.93 %) :: 0.304 ms / frame (1 calls)
.........Unaccounted: (0.227 %) :: 0.001 ms
............----------------------------------
............Profiling: solveGroup (total running time: 0.003 ms) ---
............0 -- solveGroupCacheFriendlyIterations (33.33 %) :: 0.001 ms / frame (1 calls)
............1 -- solveGroupCacheFriendlySetup (33.33 %) :: 0.001 ms / frame (1 calls)
............Unaccounted: (33.333 %) :: 0.001 ms
.........----------------------------------
.........Profiling: performDiscreteCollisionDetection (total running time: 0.190 ms) ---
.........0 -- dispatchAllCollisionPairs (2.11 %) :: 0.004 ms / frame (1 calls)
.........1 -- calculateOverlappingPairs (78.42 %) :: 0.149 ms / frame (1 calls)
.........2 -- updateAabbs (19.47 %) :: 0.037 ms / frame (1 calls)
.........Unaccounted: (0.000 %) :: 0.000 ms



Just to mention it once again: It all runs in real-time, i'm not unhappy with the performance. I am simply surprised, that it seems to be doing so much work, at all, although only a single kinematic object exists (and about 30000 static ones).

Jan.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Performance with entirely static scene

Post by Erwin Coumans »

Ah, so the total cost is just above 1 millisecond. You don't have a frame that takes > 10ms (as you mention earlier)?
......4 -- calculateSimulationIslands (39.19 %) :: 0.415 ms / frame (1 calls)
.........2 -- islandUnionFindAndQuickSort (68.93 %) :: 0.304 ms / frame (1 calls)
It seems that the island generation/union find became the bottleneck for many static objects (>0.7ms). Perhaps you can share and attach a .bullet file, so we can look at optimizing it?
Thanks,
Erwin
JanK
Posts: 17
Joined: Fri Jan 08, 2010 11:40 am

Re: Performance with entirely static scene

Post by JanK »

My profiling data is acquired across several seconds and there i saw a jump above 10ms every once in a while, but that was BEFORE i disabled the AABB updates (which decreased the time taken considerably).

Now this dump is only from a single frame, but from my other profiling data 1.2 ms seems to be the average at the moment. It does spike to 1.5 ms every once in a while, but that should be negligible.

If you tell me how i can quickly generate a .bullet file, i can give that data to you.

Thanks,
Jan.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Performance with entirely static scene

Post by Erwin Coumans »

The link in my previous posting has the details, I'll copy it for your convenience:

Code: Select all

#include "LinearMath/btSerializer.h"
And after you created the world and all objects:

Code: Select all

//create a large enough buffer. There is no method to pre-calculate the buffer size yet.
	int maxSerializeBufferSize = 1024*1024*5;
 
	btDefaultSerializer*	serializer = new btDefaultSerializer(maxSerializeBufferSize);
	dynamicsWorld->serialize(serializer);
 
	FILE* file = fopen("testFile.bullet","wb");
	fwrite(serializer->getBufferPointer(),serializer->getCurrentBufferSize(),1, file);
	fclose(file);
Thanks,
Erwin
JanK
Posts: 17
Joined: Fri Jan 08, 2010 11:40 am

Re: Performance with entirely static scene

Post by JanK »

Sorry, i didn't see that you put a link in the post.

Anyways... you can get the data here: *admin removed url after downloading it *. It's quite big, so please don't download it too often. I'll take it down in a few hours again.

Jan.
JanK
Posts: 17
Joined: Fri Jan 08, 2010 11:40 am

Re: Performance with entirely static scene

Post by JanK »

Argh, during my profiling i had disabled some entities (which is why it apparently got faster), so in the data about 6000 boxes are missing. But it for your purposes it should still be valid.

Jan.
Vroonsh
Posts: 5
Joined: Wed Mar 17, 2010 4:09 pm

Re: Performance with entirely static scene

Post by Vroonsh »

Hi everyone,

We had the same issue with our collision world containing a lot of Static and Kinematic Objects (actually around 3000 and still growing).

I've tried to remove all Static and Kinematic Objects from the calculateSimulationIslands process and it seemed to work pretty well.

Only two sources need modifications :

in btSimulationIslandManager.cpp :

Code: Select all

void	btSimulationIslandManager::updateActivationState(btCollisionWorld* colWorld,btDispatcher* dispatcher)
{
	
	// put the index into m_controllers into m_tag	
	int index = 0;
	{
	
		int i;
		for (i=0;i<colWorld->getCollisionObjectArray().size(); i++)
		{
			btCollisionObject*	collisionObject= colWorld->getCollisionObjectArray()[i];
			//Adding filtering here
                        if (!collisionObject->isStaticOrKinematicObject())
			{
				collisionObject->setIslandTag(index++);
			}
			collisionObject->setCompanionId(-1);
			collisionObject->setHitFraction(btScalar(1.));
		}
	}
	// do the union find
	
	initUnionFind( index );

	findUnions(dispatcher,colWorld);
	

	
}
and

Code: Select all

void	btSimulationIslandManager::storeIslandActivationState(btCollisionWorld* colWorld)
{
	// put the islandId ('find' value) into m_tag	
	{
		int index = 0;
		int i;
		for (i=0;i<colWorld->getCollisionObjectArray().size();i++)
		{
			btCollisionObject* collisionObject= colWorld->getCollisionObjectArray()[i];
			if (!collisionObject->isStaticOrKinematicObject())
			{
				collisionObject->setIslandTag( m_unionFind.find(index) );
				//Set the correct object offset in Collision Object Array
                                m_unionFind.getElement(index).m_sz = i;
				collisionObject->setCompanionId(-1);
				index++;
			} else
			{
				collisionObject->setIslandTag(-1);
				collisionObject->setCompanionId(-2);
			}
		}
	}
}
and, to avoid incorrect index patching in buildIslands
in btUnionFind.cpp :

Code: Select all

void	btUnionFind::sortIslands()
{

	//first store the original body index, and islandId
	int numElements = m_elements.size();
	
	for (int i=0;i<numElements;i++)
	{
		m_elements[i].m_id = find(i);
                //Only Remove this      m_elements[i].m_sz = i;
	}
	
	 // Sort the vector using predicate and std::sort
	  //std::sort(m_elements.begin(), m_elements.end(), btUnionFindElementSortPredicate);
	  m_elements.quickSort(btUnionFindElementSortPredicate());

}
Here are the performance gain we had on a PPU PS3 version with our world containing only static and kinematic objects

Before :

Code: Select all

info : ----------------------------------
info : Profiling: Root (total running time: 14.447 ms) ---
info : 0 -- convexSweepTest (0.00 %) :: 0.000 ms / frame (0 calls)
info : 1 -- stepSimulation (60.01 %) :: 8.670 ms / frame (1 calls)
info : 2 -- rayTest (0.00 %) :: 0.000 ms / frame (0 calls)
Unaccounted: (39.988 %) :: 5.777 ms
info : 	----------------------------------
info : 	Profiling: stepSimulation (total running time: 8.670 ms) ---
info : 	0 -- synchronizeMotionStates (0.03 %) :: 0.003 ms / frame (3 calls)
info : 	1 -- updateSoftBodies (0.03 %) :: 0.003 ms / frame (3 calls)
info : 	2 -- solveSoftConstraints (0.05 %) :: 0.004 ms / frame (3 calls)
info : 	3 -- internalSingleStepSimulation (95.54 %) :: 8.283 ms / frame (3 calls)
Unaccounted:	 (4.348 %) :: 0.377 ms
info : 		----------------------------------
info : 		Profiling: internalSingleStepSimulation (total running time: 8.283 ms) ---
info : 		0 -- updateActivationState (0.05 %) :: 0.004 ms / frame (3 calls)
info : 		1 -- updateActions (0.04 %) :: 0.003 ms / frame (3 calls)
info : 		2 -- integrateTransforms (0.04 %) :: 0.003 ms / frame (3 calls)
info : 		3 -- solveConstraints (41.05 %) :: 3.400 ms / frame (3 calls)
info : 		4 -- calculateSimulationIslands (22.76 %) :: 1.885 ms / frame (3 calls)
info : 		5 -- performDiscreteCollisionDetection (35.54 %) :: 2.944 ms / frame (3 calls)
info : 		6 -- predictUnconstraintMotion (0.04 %) :: 0.003 ms / frame (3 calls)
Unaccounted:		 (0.495 %) :: 0.041 ms
info : 			----------------------------------
info : 			Profiling: solveConstraints (total running time: 3.400 ms) ---
info : 			0 -- processIslands (42.97 %) :: 1.461 ms / frame (3 calls)
info : 			1 -- islandUnionFindAndQuickSort (56.38 %) :: 1.917 ms / frame (3 calls)
Unaccounted:			 (0.647 %) :: 0.022 ms
info : 			----------------------------------
info : 			Profiling: performDiscreteCollisionDetection (total running time: 2.944 ms) ---
info : 			0 -- dispatchAllCollisionPairs (70.21 %) :: 2.067 ms / frame (3 calls)
info : 			1 -- calculateOverlappingPairs (0.20 %) :: 0.006 ms / frame (3 calls)
info : 			2 -- updateAabbs (29.08 %) :: 0.856 ms / frame (3 calls)
Unaccounted:			 (0.510 %) :: 0.015 ms
After :

Code: Select all

info : ----------------------------------
info : Profiling: Root (total running time: 10.644 ms) ---
info : 0 -- convexSweepTest (0.00 %) :: 0.000 ms / frame (0 calls)
info : 1 -- stepSimulation (38.89 %) :: 4.139 ms / frame (1 calls)
info : 2 -- rayTest (0.00 %) :: 0.000 ms / frame (0 calls)
Unaccounted: (61.114 %) :: 6.505 ms
info : 	----------------------------------
info : 	Profiling: stepSimulation (total running time: 4.139 ms) ---
info : 	0 -- synchronizeMotionStates (0.05 %) :: 0.002 ms / frame (3 calls)
info : 	1 -- updateSoftBodies (0.05 %) :: 0.002 ms / frame (3 calls)
info : 	2 -- solveSoftConstraints (0.02 %) :: 0.001 ms / frame (3 calls)
info : 	3 -- internalSingleStepSimulation (90.79 %) :: 3.758 ms / frame (3 calls)
Unaccounted:	 (9.084 %) :: 0.376 ms
info : 		----------------------------------
info : 		Profiling: internalSingleStepSimulation (total running time: 3.758 ms) ---
info : 		0 -- updateActivationState (0.11 %) :: 0.004 ms / frame (3 calls)
info : 		1 -- updateActions (0.05 %) :: 0.002 ms / frame (3 calls)
info : 		2 -- integrateTransforms (0.05 %) :: 0.002 ms / frame (3 calls)
info : 		3 -- solveConstraints (0.45 %) :: 0.017 ms / frame (3 calls)
info : 		4 -- calculateSimulationIslands (48.99 %) :: 1.841 ms / frame (3 calls)
info : 		5 -- performDiscreteCollisionDetection (49.60 %) :: 1.864 ms / frame (3 calls)
info : 		6 -- predictUnconstraintMotion (0.03 %) :: 0.001 ms / frame (3 calls)
Unaccounted:		 (0.718 %) :: 0.027 ms
info : 			----------------------------------
info : 			Profiling: solveConstraints (total running time: 0.017 ms) ---
info : 			0 -- processIslands (11.76 %) :: 0.002 ms / frame (3 calls)
info : 			1 -- islandUnionFindAndQuickSort (11.76 %) :: 0.002 ms / frame (3 calls)
Unaccounted:			 (76.471 %) :: 0.013 ms
info : 			----------------------------------
info : 			Profiling: performDiscreteCollisionDetection (total running time: 1.864 ms) ---
info : 			0 -- dispatchAllCollisionPairs (75.43 %) :: 1.406 ms / frame (3 calls)
info : 			1 -- calculateOverlappingPairs (0.27 %) :: 0.005 ms / frame (3 calls)
info : 			2 -- updateAabbs (23.82 %) :: 0.444 ms / frame (3 calls)
Unaccounted:			 (0.483 %) :: 0.009 ms
There is a signnificant gain in solveConstraints and performDiscreteCollisionDetection (~4ms).
I've tried almost all Samples without noticing any difference in behavior.

Hope this could help.


Boris.