Converting to Real-World DistancesBut what about these pixels’ actual depth? We know that the color
of each pixel in the depth image is supposed to relate to the distance of that part of the scene, but how far away is 96
or 115 or 224? How do we translate from these brightness values to a
distance measurement in some kind of useful units like inches or
centimeters?Well, let’s start with some basic proportions. We know from our
experiments earlier in this chapter that the Kinect can detect objects
within a range of about 20 inches to 25 feet. And we know that the
pixels in our depth image have brightness values that range between 255
and 0. So, logically, it must be the case that pixels in the depth image
that have a brightness of 255 must correspond to objects that are 20
inches away, pixels with a value of 0 must be at least 25 feet away, and
values in between must represent the range in between, getting farther
away as their values descend toward 0.So, how far away is the back of our chair with the depth pixel
reading of 96? Given the logic just described, we can make an estimate.
Doing a bit of back-of-the-envelope calculation tells me that 96 is
approximately 37% of the way between 0 and 255. Hence the simplest
possible distance calculation would hold that the chair back is 37% of
the way between 25 feet and 20 inches, or about 14 and a half feet. This
is clearly not right. I’m sitting in front of the chair now and it’s no
more than 8 or 10 feet away from the Kinect.Did we go wrong somewhere in our calculation? Is our Kinect
broken? No. The answer is much simpler. The relationship between the
brightness value of the depth pixel and the real-world distance it
represents is more complicated than a simple linear ratio. There are
really good reasons for this to be the case.First of all, consider the two ranges of values being covered. As
we’ve thoroughly discussed, the depth pixels represent grayscale values between 0 and 255.
The real-world distance covered by the Kinect, on the other hand, ranges
between 20 inches and 25 feet. Further, the Kinect’s depth readings have
millimeter precision. That means they need to report not just a number
ranging from 0 to a few hundred inches, but a number ranging from 0 to
around 8,000 millimeters to cover the entirety of the distance the
Kinect can see. That’s obviously a much larger range than can fit in the
bit depth of the pixels in our depth image.Now, given these differences in range, we could still think of
some simple ways of converting between the two. For example, it might
have occurred to you to use Processing’s map function, which scales a variable from an
expected range of input values to a different set of output values.
However, using map is a little bit
like pulling on a piece of stretchy cloth to get it to cover a larger
area: you’re not actually creating more cloth, you’re just pulling the
individual strands of the cloth apart, putting space between them to
increase the area covered by the whole. If there were a pattern on the
surface of the cloth, it would get bigger, but you’d also start to see
spaces within it where the strands separated. Processing’s map function does something similar. When you
use it to convert from a small range of input values to a larger output,
it can’t create new values out of thin air—it just stretches out the
existing values so that they cover the new range. Your highest input
values will get stretched up to your highest output value and your
lowest input to your lowest output, but just like with the piece of
cloth, there won’t be enough material to cover all the intermediate
values. There will be holes. And in the case of mapping our depth pixels
from their brightness range of 0 to 255 to the physical range of 0 to
8,000 millimeters, there will be a lot of holes. Those 256 brightness
values will only cover a small minority of the 8,000 possible distance
values.To cover all of those distance values without holes, we’d need to
access the depth data from the Kinect in some higher resolution form. As
I mentioned in the introduction, the Kinect actually captures the depth
information at a resolution of 11 bits per pixel. This means that these
raw depth readings have a range of 0 to 2,047—much better than the 0 to
255 available in the depth pixels we’ve looked at so far. And the SimpleOpenNI library gives us a way to access these
raw depth values.But wait! Have we been cheated? Why don’t the depth image pixels
have this full range of values? Why haven’t we been working with them
all along?Remember the discussion of images and pixels at the start of this
chapter? Back then, I explained that the bigger of a number we use to
store each pixel in an image, the more memory it takes to store that
image and the slower any code runs that has to work with it. Also, it’s
very hard for people to visually distinguish between more shades of gray
than that anyway. Think back to our investigations of the depth image.
There were areas in the back of the scene that looked like flat expanses
of black pixels, but when we clicked around on them to investigate, it
turned out that even they had different brightness values, simply with
differences that were too small for us to see. For all of these reasons,
all images in Processing have pixels that are 8 bits in depth—i.e.,
whose values only range from 0 to 255. The PImage class we’ve used in our examples that
allows us to load images from the Kinect library and display them on the screen enforces the use of
these smaller pixels.What it really comes down to is this: when we’re displaying
depth information on the screen as images, we make a set
of trade-offs. We use a rougher representation of the data because it’s
easier to work with and we can’t really see the differences anyway. But
now that we want to use the Kinect to make precise distance
measurements, we want to make a different set of trade-offs. We need the
full-resolution data in order to make more accurate measurements, but
since that’s more unwieldy to work with, we’re going to use it more
sparingly.Let’s make a new version of our Processing sketch that uses the
higher-resolution depth data to turn our Kinect into a wireless tape measure. With this higher-resolution data,
we actually have enough information to calculate the precise distance
between our Kinect and any object in its field of view and to display it
in real-world units like inches and millimeters.
You are currently reading a PREVIEW of this book.
Get instant access to over
$1 million worth of books and videos.