Generating data for 3D object recognition

hiddenmarkov · Post by **hiddenmarkov** » Fri Sep 07, 2007 10:35 pm

First, a disclaimer: If your immediate response to this post is "This should not be posted here," I apologize. The scope of this problem is such that I am not sure exactly where to go looking for answers to technical questions. If you can think of a forum that might be more fit to take my questions, that information would be invaluable in and of itself.

Now on to the problem.

I need to generate some data for testing a 3D object recognition engine. This data should be ~5cm sampling of the visible surfaces of an urban landscape that is ~5km in size. The sampling regime need not be entirely uniform or orthonormal. The form of the samples should be <x,y,z,r,g,b,obj_type> where <x,y,z> is a point on the surface of the object, <r,g,b> is the point's color, and <obj_type> is the type of object that the point belongs to. The total number of samples should be ~1e9, for a total data size of ~1TB.

It seems that there are two possible approaches to generating this data:

1) Drive a LIDAR/EO sensor around a city and register <x,y,z,r,g,b> for 1e9 points. Then have a human go through and label each point's "object type" by looking at the data or going out and looking at the object the data were sampled from.

2) Get a bunch of freely available 3D models of the kinds of things you find in an urban environment. Label them according to what kind of object they are. Build a virtual urban landscape by throwing together multiple copies of the labeled objects. Port the landscape to an interactive 3D rendering engine. Build a virtual sensor that can sample 1e9 points from the model. Find some way to instrument the engine's back-end so that it samples not only <x,y,z> and <r,g,b>, but also the object label that we applied earlier.

Both of these approaches are costly to implement, but both are doable. The advantage of #2 is that once we have an implementation in hand, the cost of generating a new data set is relatively low. So, it is my plan to go forward with #2. However, I am not a expert in 3D modeling or virtual environments. Now that I have motivated the problem a little, here are issues that I am looking for some advice on:

A) What is the best tool for rapidly modeling a 3D urban landscape, given that I have access to lots of 3D models of cars, buildings, street signs, etc?

B) Which virtual environment engine should I use to render and interact with the model?

My feeling is that (A) can be satisfied by a number of existing tools. The only real discriminator is that the tool I end up using must provide some facility for associating an arbitrary string (the "object type" label) with each object that I import into the environment. It must also support an output format that maintains this label and matches the input format for the answer to (B). The crux of the problem seems to be simulating a sensor in (B) that can get hold of not only the <x,y,z,r,g,b> values of surface points of the model, but also the label associated with each. My intuition tells me that it is straightforward (but not easy) to script, say, Unreal engine in a way that would allow me to do the <x,y,z,r,g,b> sampling. The difficulty is that such engines are built to render surfaces, and probably do not support much facility for querying the backend for additional properties of collision points. This is where I may find a huge flaw in my plan. Anybody have any advice before I go off and do something ridiculous? If you were trying to solve this problem, what would you do?

Requests for clarification or additional information will be answered as quickly as possible.

Thanks in advance!