32/64 bit issue

unapiedra · Post by **unapiedra** » Sat Jan 17, 2009 9:52 pm

"It doesn't work!"

Sorry. Here is my problem: I installed a variety of software on my machine. It's a simulator we are trying to write that simulates robots in the RoboCup Junior competition. The simulator (called arena) uses bullet physics, irrlicht and blender. It works perfectly well on four systems at the moment. But it doesn't run under mine. Well, it does - in a way.

It compiles perfectly and everything but when I start the simulator I get the field and shortly after the robot is initilzed. The program positions the robot above the field and it falls down a bit. Now here things start to differ on different machines:
Normally, the robot drops down to the field and stays there. It works like a charm and one can for example start the program that runs on the robot. So the robot moves forward, sideways, everything...
On my machine the robot falls down but when it reaches almost the field it disappears (to be more precise: the coordinates become NaN).

The lead programmer in our team (which is not me) doesn't know any solution either.

However, we did some further analysis:
The coordinates turn NaN when the robot are close to the field. It doesn't matter if the field ist positioned at a global height of 0 or of -1 (meter).

Code: Select all

X = 0.000000, Y = 1.752025, Z = 0.250000
X = 0.000000, Y = 1.752025, Z = 0.250000
X = 0.000000, Y = 1.752025, Z = 0.250000
X = 0.000000, Y = 1.752025, Z = 0.250000
X = 0.000000, Y = 1.752025, Z = 0.250000
X = 0.000000, Y = 1.713875, Z = 0.250000
X = 0.000000, Y = 1.713875, Z = 0.250000
X = 0.000000, Y = 1.713875, Z = 0.250000
X = 0.000000, Y = 1.713875, Z = 0.250000
...................................................................
X = 0.000000, Y = -0.945725, Z = 0.250000
X = 0.000000, Y = -0.945725, Z = 0.250000
X = 0.000000, Y = -0.945725, Z = 0.250000
X = 0.000000, Y = -0.945725, Z = 0.250000
X = nan, Y = nan, Z = nan
X = nan, Y = nan, Z = nan
X = nan, Y = nan, Z = nan

(We are printing this in the main-loop.)

Differences/Setup on the machines are as follows:
Working machines:
Running Linux, Kubuntu 8.10, 32bit with KDE4 (4.1 and 4.2) (on three different computers)
also Ubuntu on a MacBook with wmii, also 32bit.

Not working machine (mine):
Kubuntu 8.04, 64bit

We are all using the same bullet version (the newest one).

Could anyone help resolve this?

Cheers,
Chris

sparkprime · Post by **sparkprime** » Sat Jan 17, 2009 10:14 pm

Can you check all the data being passed into bullet (triangle meshes, impulses, positions, quats) and make sure they are sane? Last time I had a problem like this, it was because I had a divide somewhere else in my code and in very unfortunate cases it would divide by 0, and the NaN would propagate everywhere else. I'm not saying you are dividing by zero but there could be a 32/64 bit problem outside of bullet that ends up manifesting as a bullet problem.

x-quadraht · Post by **x-quadraht** » Sat Jan 17, 2009 10:33 pm

Hi,

I'm the mentioned "Lead programmer"

The collision happens between a static btCompoundShape (the field) consisting of btBoxShapes with hard-coded dimensions (so no nan here), and a dynamic btCylinderShape (the robot), also with hard-coded dimensions.
No triangle meshes or anything of the sort is passed to bullet.
The orientation & position is set once at initialisation and never touched again.
We apply the motor forces of the robot using btRigidBody::applyForce(), but even with it commented out the nan keeps popping up close to the field.

I guess we have no choice but to step into bullet with gdb and see where the nan appears.
No one has seen a similar issue before?

Max

sparkprime · Post by **sparkprime** » Sat Jan 17, 2009 11:59 pm

When I experienced my NaN problem it actually triggered an assert in the quaternion normalize function while integrating the transforms.

x-quadraht · Post by **x-quadraht** » Sun Jan 18, 2009 7:42 pm

Okay, we got it fixed.

The problem was that we were calling collisionShape->calculateLocalInertia() on a static object (the field).
Manual stepping located the problem somewhere in the constraint solver, but that was ultimately a subsequent one.
If someone _ever_ experiences weird NaN-bugs again, here is how we located the error:

1) Switch on floating point exception generation via feenableexception (see manpage), for one exception type at a time
2) Tell gdb to catch the SIGFPE signals via "handle SIGFPE stop nopass"

The nan was generated in this lines (btCompoundShape.cpp:195):

Code: Select all

inertia[0] = mass/(btScalar(12.0)) * (ly*ly + lz*lz);
inertia[1] = mass/(btScalar(12.0)) * (lx*lx + lz*lz);
inertia[2] = mass/(btScalar(12.0)) * (lx*lx + ly*ly);

lx, ly and lz were all set to 2.00003*10^30, so addition and multiplication with 12 created an overflow or something of the sort.

We fixed it in our case by avoiding calculateLocalInertia() when the mass is zero, but I think the m_halfExtents field of the btCompoundShape is also set incorrectly (our shapes are all in the range 1-3 units). This could be a bullet bug (?).
[EDIT]Interesting: The overflow occured only on the 64-bit system. I will run the unfixed program on my computer (32-bit) to evaluate this further...[/EDIT]

Max

x-quadraht · Post by **x-quadraht** » Sun Jan 18, 2009 7:57 pm

Okay, comparison 32/64-bit:

32-bit: 0/12.0*((2*10^30)^2 + (2*10^30)^2) == 0
64-bit: 0/12.0*((2*10^30)^2 + (2*10^30)^2) == -nan

Strange.

sparkprime · Post by **sparkprime** » Mon Jan 19, 2009 5:33 am

That is weird. Assuming you are using 32 bit floats, 2E61 is infinity. But the 0 * inf should return 0 afaik.

It's obviously not a good idea to calculate an inertia for something which has a mass of 0 but an assert would probably be helpful in that function.

That technique you mention for locating FP exceptions would have made my life a lot easier last time if I had known

It may be worth doing it just to see if there aren't any FP errors that I don't know about, as they're usually a bug.

Erwin Coumans · Post by **Erwin Coumans** » Mon Jan 19, 2009 5:48 am

We fixed it in our case by avoiding calculateLocalInertia() when the mass is zero

Indeed, never calculate an inertia for static objects with zero mass.

but I think the m_halfExtents field of the btCompoundShape is also set incorrectly

The half extents for a btCompoundShape are updated after adding each child shape. Can you put a breakpoint and step through the addChildShape code to see why the half extents are not properly updated?

Thanks,
Erwin

Real-Time Physics Simulation Forum

32/64 bit issue

32/64 bit issue

Re: 32/64 bit issue

Re: 32/64 bit issue

Re: 32/64 bit issue

Re: 32/64 bit issue

Re: 32/64 bit issue

Re: 32/64 bit issue

Re: 32/64 bit issue