Bullet MSVC Release build has SSE2 enabled by default

reptor
Posts: 17
Joined: Mon Jan 05, 2009 3:44 pm

Bullet MSVC Release build has SSE2 enabled by default

Post by reptor »

Hello

It seems that the "Release" build configuration at least for Microsoft Visual Studio builds with SSE2 enabled by default in the IDE's project configuration.

I was puzzled for a while trying to figure out why my program crashes even before entering the main-function. It said "illegal instruction".

Well the solution for that was to either disable SSE completely from Visual Studio's project configuration dialog for the Bullet projects, or set it to SSE instead of SSE2. My computer obviously does not support SSE2 but supports SSE.

Question: Does the Bullet physics library detect on runtime if the computer supports SSE/SSE2/SSE3 et cetera, or is it purely a compile-time decision?

As an example, the Ogre3D graphics library does runtime-detection of these processor features and selects appropriate functions accordingly. This would be ideal behaviour for the Bullet physics library as well.

If Bullet does not do runtime-detection, then it will mean that I have to disable SSE completely from it as it would otherwise not likely work on the hardware I want to support.
User avatar
projectileman
Posts: 109
Joined: Thu Dec 14, 2006 4:27 pm
Location: Colombia

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by projectileman »

Selecting SSE capabilities dinamically could slowdown the performance of bullet seriously, since most numeric routines in Bullet are inlined. Are you suggesting function pointers to do that? Conditional selection of strategies through if-else instructions? That gives looses in performance so you couldn't get any benefit from SSE features.

You have to compile a different binary executable for each target platform (from 486 to lastest Pentium 4/core 2 duo/ AMD athlon 64/phenom).
reptor
Posts: 17
Joined: Mon Jan 05, 2009 3:44 pm

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by reptor »

projectileman wrote:Are you suggesting function pointers to do that?
I believe this is what Ogre3D does. I cannot comment on the performance of it, if there is a noticeable performance penalty.


projectileman wrote:You have to compile a different binary executable for each target platform (from 486 to lastest Pentium 4/core 2 duo/ AMD athlon 64/phenom).
I do understand that it is not possible to have a one and same executable for many kinds of platforms.

However, I do think that even if we exclude old-ish hardware completely, and require the users to have something like "AMD Athlon XP or newer or equivalent" even then we are making quite an assumption about the SSE, trusting the hardware manufacturers quite a lot (too much imho). I think it would be better to have a more robust way to make sure that the program doesn't just crash at the face of the user if the processor happens to be missing SSE for whatever reason. Runtime-detection of CPU capabilities would be exactly that.

I do not think that telling the users that "SSE is required to run this program" is an option - well if we know that the users are all computer programmers then it is an option but otherwise it is not.


About the performance. My guess is that even with having runtime-detection of SSE capabilities, the performance would be better than not having SSE enabled at all.

Maybe it could be worthwhile to provide an option in Bullet to either have runtime-detection of SSE or let people exclude the runtime-detection completely with a compile-time setting?





One reason why I posted about this here was to point out to people that if your program crashes and you are at a loss as to why it happens, then make sure you check the /arch setting. In Visual C++ 2008 it can be found from Configuration Properties - C/C++ - Code Generation - Enable Enhanced Instruction Set.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by Erwin Coumans »

Runtime-detection of SSE only works for self-written SIMD (SSE/SSE2) code where there is an alternative non-SIMD branch. We have written some SIMD constraint solver code, which can be toggled at run-time (dynamicsWorld->getSolverInfo().m_solverMode). We could add some automatic detection for this part indeed.

But we indeed enabled SSE2 auto vectorization in the MSVC projectfiles. (in MSVC projectfiles Configuration Properties - C/C++ - Code Generation - Enable Enhanced Instruction Set.). This autovectorized SIMD can't be disabled at run-time.

Should we set the auto-vectorization to use SSE instead SSE2?
Thanks for the feedback,
Erwin
reptor
Posts: 17
Joined: Mon Jan 05, 2009 3:44 pm

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by reptor »

I don't think it matters much what the setting in the project file is, I mean if it is SSE or SSE2. Because both have the same problem - a user of an application might have a modern-ish x86 processor but no SSE at all for all I know. At least I would not assume that they have it - so I would have either the runtime-detection or if that's not possible then I would have SSE disabled completely.

This boils down to can we expect that everyone with a computer that can be categorised as "supported" by us (let's say in my case it would be AMD Athlon XP and newer or "equivalent" models from their competitors) has SSE or not. If everyone has it, then there is no problem. I just think I cannot make such an assumption.

I would be interested in hearing from experienced commercial game developers how do they approach this. Do they just enable SSE in all of their products or do they have a more careful approach, perhaps with runtime-detection.

Of course this depends on how old hardware you want to support - if you say your application requires Intel Core 2 Duo or newer then it would make sense to just enable SSE. There would still be some level of uncertainty (can you trust that all new processor models will have it, for example?) but much less than with older hardware from just a few years back.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by Erwin Coumans »

SSE2 might be too novel, but isn't SSE already over 10 years old?
a user of an application might have a modern-ish x86 processor but no SSE at all for all I know
Which modernish PC CPU doesn't support SSE?
Thanks,
Erwin
reptor
Posts: 17
Joined: Mon Jan 05, 2009 3:44 pm

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by reptor »

Well I said for all I know. My target is indeed to support processors from this century :) but I was thinking it could be dangerous to assume that there is SSE always available.

I asked the question at gamedev.net http://www.gamedev.net/community/forums ... _id=534132 in the hope to get opinions from more people than what is possible here.


It could possibly be a good default setting to set it to SSE instead of SSE2. People can go and change it if they want (maybe make a note in the documentation?). This way people who have processors like the Athlon XP would not be wondering why their Bullet programs crash when compiled in the Release configuration...

Was it by the way intentional to leave the SSE setting out from the ReleaseDll configuration? I was wondering why it would be enabled for the static version but not for the DLL version. It made me think perhaps the whole thing is enabled by accident in the one configuration.
User avatar
projectileman
Posts: 109
Joined: Thu Dec 14, 2006 4:27 pm
Location: Colombia

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by projectileman »

Hi reptor.

I think that you're taking this issue more complicated that it requires.

Just develop an application launcher that detects which kind of hardware profile the user posseses, and it should launch the respective executable which has been compiled for that target.

Managing SSE extensions is not the same as managing OpenGL extensions. Branches and Function Pointer Delegating take more CPU workload than core math routines in modern hardware. Erwin could explain this better.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by Erwin Coumans »

projectileman wrote:Just develop an application launcher that detects which kind of hardware profile the user posseses, and it should launch the respective executable which has been compiled for that target.

Managing SSE extensions is not the same as managing OpenGL extensions. Branches and Function Pointer Delegating take more CPU workload than core math routines in modern hardware. Erwin could explain this better.
No, don't mix up the two different ways of using SIMD, autovectorization cannot be disabled at run-time, while the manual SSE-optimized constraint solver code can be switched off at run-time already.

1) We will adjust the MSVC projectfiles using SSE, instead of SSE2, for automatic vectorization.

2) A detection mechanism can be added to select the SSE constraint solver, when SSE is available.

Either way, SSE is likely available in most modern Win32 machines.
Thanks,
Erwin
User avatar
projectileman
Posts: 109
Joined: Thu Dec 14, 2006 4:27 pm
Location: Colombia

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by projectileman »

Thanks Erwin for your correction. It's a good thing that bullet can select the best approach for solving constraints at runtime, either the SSE version or the standard. That reflects a good software design.

However, I have to mention that selecting different code approaches for taking advantage of SIMD hardware if avaliable at runtime is convenient, ONLY at a high level of abstraction.

But doing so at atomic operations like vector operations, or simple geometric queries could harm the performance seriously. Many people believe than function pointers have no runtime penalties, but in some cases depends. For example, Bullet vector library has a lot of inlined functions for convenience. In a big routine like GJK narrow collision vector operations are called several times during a loop. If those operations were not inlined and they were called with pointer indirection, that routine could takes 2x or 3x times longer.


For illustrating this issue better, a guy in Gamedev named Jan Wassenberg wrote:
Frob Quote:
The function pointers are no different than any other function to use, and there is no runtime performance penalty.
That's a dangerously misleading statement to make. Indirect function calls are actually much more expensive than direct calls unless you're lucky enough to have a 100% branch-target buffer hit rate (pretty much impossible in practice due to aliasing). The runtime-dispatch methods that have more of a claim to "no runtime performance penalty" have to do insane things like patch all call sites (in which case the code still isn't inlined) or copy the inlined function code (for which the code sizes better match, else you're executing lots of NOPs). IMO, the only way to correctly claim *negligible* overhead is if the runtime dispatch is on a very high level, and in that case it doesn't matter if you have function pointers or branching.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by Erwin Coumans »

Yes. SIMD works best if you optimize an entire innerloop that takes a considerable amount of time, preferably in the order of thousands to hundreds of thousands of cycles. That way, a conditional SIMD test would only be evaluated a few times per frame, so it won't impact performance.

We plan to optimize GJK, voronoi solver and support map, and once we have done this, we allow to switch the SIMD and non-SIMD version at run-time, but only at the high-level. Luckily, the GJK/voronoi/supportmap code code is very compact and computationally intensive, so a good candidate for SIMD.

You are right, adding a run-time switch, or function pointers, could harm performance. In general, adding SIMD support without paying attention to the rest of the code (such as inserting SIMD in the vector library) can easily harm performance. Especially when conversions happen between SIMD and FPU registers.
So for the non-hotspots we will rely on SSE autovectorization.

By the way, the msvc projectfiles latest trunk has SSE enabled, instead of SSE2.
Thanks!
Erwin
reptor
Posts: 17
Joined: Mon Jan 05, 2009 3:44 pm

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by reptor »

I think it is a good change - as now you have less chance of new Bullet users coming here to complain that their Bullet programs just crash.

And people who know about it can just change the setting if they want.
cbuchner1
Posts: 17
Joined: Fri Apr 10, 2009 6:44 pm

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by cbuchner1 »

Thank you for enabling SSE overSSE2 by default.

Most Release demos now work on my Dual Athlon MP system (technology of the year 2001/2002).

However the ReleaseAllBulletDemos.exe binary is still crashing for me.

Christian
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by Erwin Coumans »

Strange, have you tried rebuilding everything?

Does it only happen in release mode, or also debug?

Alternatively, please use cmake to generate msvc projectfiles: just download and install cmake from http://cmake.org and then in Bullet root folder run
cmake . -G "Visual Studio 8 2005"
And open the generated MSVC projectfile.

Hope this helps,
Erwin
cbuchner1
Posts: 17
Joined: Fri Apr 10, 2009 6:44 pm

Re: Bullet MSVC Release build has SSE2 enabled by default

Post by cbuchner1 »

Erwin Coumans wrote:Strange, have you tried rebuilding everything?
Erwin
Tried it. This fixed it. Now I am feeling stupid.