Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Since the Kinect gives a dense depth map of a scene, a lot of information can be extracted from a single image. This is very convenient because it avoids the need to move the object or the camera to get a complete set of viewpoints. Of course, since only a small part is visible in a single image, this is adapted only to situations where we have some apriori knowledge about the object geometry and where a precise mesh is not required.
The first issue when modelingis to determine the object of interest in the scene. Indeed, in Figure 9-1, the object could be the table, the floor, a foot of the table, or any of the items lying on the table. This differentiation is called the segmentation process and consists of separating in the scene the objects (foreground) from the rest (background). When using a Kinect, a very popular way of doing this is to assume that the objects are lying on a flat table, similar to our sample scene in Figure 9-1. As you will see, a flat 3-D plane can be extracted quite precisely and robustly using depth data.