Project 4: A Wireless Tape MeasureThe code in this section is going to introduce a new programming
concept that I haven’t mentioned before and that might not be completely
familiar to you from your previous work in Processing: accessing
arrays of pixels. Specifically, we’ll be learning how to
translate between a one-dimensional array of pixels and the
two-dimensional image that it represents. I’ll explain how to do all of
the calculations necessary to access the entry in the array that
corresponds to any location in the image. And, more important, I’ll
explain how to think about the relationship between the image and the
array so that these calculations are intuitive and easy to
remember.At first, this discussion of pixels and arrays may feel like a
diversion from our core focus on working with the Kinect. But the Kinect
is, first and foremost, a camera, so much of our work with it will be
based on processing its pixels.Up to this point, whenever we’ve wanted to access the Kinect’s
pixels, we’ve first displayed them on the screen in the form of images.
However, as we just discussed, this becomes impractical when we want to
access the Kinect’s data at a higher resolution or when we want to
access more of it than a single pixel’s worth. Therefore, we need to
access the data in a manner that doesn’t first require us to display it
on the screen. In Processing (and most other graphics programming
environments) this image data is stored as arrays of pixels behind the scenes. Accessing these arrays
of pixels directly (while they are still off stage) will let our
programs run fast enough to do some really interesting things like
working with the higher resolution data and processing more than one
pixel at a time.Even though Processing stores images as flat arrays of pixels, we
still want the ability to think of them two-dimensionally. We want to be
able to figure out which pixel a user clicked on or draw something on
the screen where we found a particular depth value. In this section,
I’ll teach you how to make these kinds of translations. We’ll learn how
to convert between the array the pixels are stored in and their position
on the screen.In this section, I’ll introduce this conversion technique by
showing you how to access an array of higher resolution values from the
Kinect. Then we’ll use that to turn the Kinect into a wireless tape measure, converting these depth values into accurate real-world units. Once we’ve
got this down, we’ll be ready to take things one step further and start
working with all of the pixels coming from the Kinect.We’ll start our tape measure with a new version of our Processing
sketch. This version will be along the same lines as the sketch we’ve
been working with but with a few important differences. The basic
structure will be the same. We’ll still display images from the Kinect
and then output information about them when we click, but we’ll change
what we display, both on the screen and in the output when we
click.First of all, let’s forget about the color image from the Kinect.
We’ve learned all that it has to teach us for now and so we’re banishing
it to focus more on the depth image. Second, we’ll rewrite our mousePressed function to access and display
the higher resolution depth data from the Kinect. I’ll explain how this
works in some detail, but first take a look at the full code, noticing
the changes to setup and draw that come from eliminating the color
image:import SimpleOpenNI.*;
SimpleOpenNI kinect;
void setup()
{
size(640, 480);
kinect = new SimpleOpenNI(this);
kinect.enableDepth();
}
void draw()
{
kinect.update();
PImage depthImage = kinect.depthImage();
image(depthImage, 0, 0);
}
void mousePressed(){
int[] depthValues = kinect.depthMap();
int clickPosition = mouseX + (mouseY * 640);
int clickedDepth = depthValues[clickPosition];
float inches = clickedDepth / 25.4;
println("inches: " + inches);
}The changes to setup and
draw are minimal: I’m no longer
accessing the RGB image from the Kinect and no longer displaying it. And
since we’re now only displaying one image, I made the whole sketch
smaller, because we don’t need all that horizontal real estate just to
show the depth image.Now, let’s talk about the real substantial difference here: the
changes I’ve made to mousePressed.
First of all, mousePressed calls a
new function on our kinect object
that we haven’t seen before: depthMap. This is one of a few functions that
SimpleOpenNI provides that give us access to the higher resolution
depth data. This is the simplest one. It returns all of
the higher resolution depth values unmodified—neither converted or
processed.In what form does kinect.depthMap return these depth values? Up
until now, all the depth data we’ve seen has reached us in the form of
images. We know that the higher-resolution values that kinect.depthMap returns can’t be stored as the
pixels of an image. So, then, in what form are they stored? The answer
is: as an array of integers. We have one integer for each depth value
that the Kinect recorded, and they’re all stored in one array. That’s
why the variable we use to save the results of kinect.depthMap is declared thusly: int[] depthValues. That int[] means that our depthValues variable will store an array of
integers. If you have a hard time remembering how array declarations
like this one work in Processing (as I often do), you can think of the
square brackets as being a box that will contain all the values of the
array and the int that comes before
it as a label telling us that everything that goes in this box must be
an integer.So, we have an array of integers. How can this box full of numbers
store the same kind of information we’ve so far seen in the pixels of an
image? The Kinect is, after all, a camera. The data that comes from it
is two-dimensional, representing all the depth values in its rectangular
field of view, whereas an array is one-dimensional, it can only store a
single stack of numbers. How do you represent an image as a box full of
numbers?Here’s how. Start with the pixel in the top-leftmost corner of the
image. Put it in the box. Then, moving to the right along the top row of
pixels, put each pixel into the box on top of the previous ones. When
you get to the end of the row, jump back to left side of the image, move
down one row, and repeat the procedure, continuing to stick the pixels
from the second row on top of the ever-growing stack you began in the
first row. Continue this procedure for each row of pixels in the image
until you reach the very last pixel in the bottom right. Now, instead of
a rectangular image, you’ll have a single stack of pixels: a
one-dimensional array. All the pixels from each row will be stacked
together, and the last pixel from each row will be right in front of the
first pixel from the next row, as Figure 2-12 shows.Figure 2-12. Pixels in a two-dimensional image get stored as a flat array.
Understanding how to split this array back into rows is key to
processing images.This is exactly how the array returned by kinect.depthMap is structured. It has one
high-resolution depth value for each pixel in the depth image. Remember
that the depth image’s resolution is 640 by 480 pixels. That means that
it has 480 rows of pixels each of which is 640 pixels across. So, from
the logic above, we know that the array kinect.depthMap returns contains 307,200 (or
640 times 480) integers arranged in a single linear stack. The first
integer in this stack corresponds to the top left pixel in the image.
Each following value corresponds to the next pixel across each row until
the last value finally corresponds to the last pixel in the bottom
right.But how do we access the values of this array? More specifically,
how do we pull out the integer value that corresponds to the part of the
image that the user actually clicked on? This is the mousePressed event, after all, and so all we
have available to us is the position of the mouse at the time that the
user clicked. As we’ve seen, that position is expressed as an x-y
coordinate in the variables mouseX
and mouseY. In the past versions of
the sketch, we used these coordinates to access the color value of a
given pixel in our sketch using get,
which specifically accepted x-y coordinates as its arguments. However,
now we have a stack of integers in an array instead of a set of pixels
arranged into a rectangle. Put another way, instead of having a set of
x-y coordinates in two axes, we only have a single axis: the integers in
our single array. To access data from the array, we need not a pair of
x-y coordinates, but an index: a number that tells
us the position in the array of the value we’re looking for. How do we
translate from the two axes in which the depth image is displayed and
the user interacts with to the single axis of our integer array? In
other words, how do we convert mouseX
and mouseY into the single position
in the array that corresponds to the user’s click?To accomplish this, we’ll have to do something that takes into
account how the values were put into the array in the first place. In
filling the array, we started at the top-left corner of the image, went
down each pixel in each row to the end adding values, and then jumped
back to the beginning of the next row when we reached the edge of the
image. Imagine that you were counting values as we did this, adding one
to your count with each pixel that got converted into a value and added
to the array. What would your count look like as we progressed through
the image?For the first row, it’s pretty obvious. You’d start your count at
0 (programmers always start counting at 0) and work your way up as you
go across the first row. When you reach the last pixel in the first row,
your count will be 639 (there are 640 pixels in the row and you started
counting at 0 for the first pixel). Then, you’d jump back to the left
side of the image to continue on the second row and keep counting. So
pixel one on row two would be 640, pixel two would be 641, and so on
until you reach the end of row two. At the last pixel of row two, you’d
be up to 1279, which means that the first pixel in row three would be
1280. If you continue for another row, you’d finish row three at 1919,
and the first pixel of row four would be 1920.Notice how the first pixel of every row is always a multiple of
640? If I asked what the number would be for the first pixel in the 20th
row in the image, instead of counting, you could just multiply: 640
times 20 is 12,800. In other words, the number for the first pixel in
each row is the width of the image (i.e., 640) multiplied by which row
we’re on (i.e., how far down we are from the top of the image).Let’s come back to our mousePressed function for a second. In that
function, we happen to have a variable that’s always set to exactly how
far down the mouse is from the top of the image: mouseY. Our goal is to translate from mouseX and mouseY to the number in our count
corresponding to the pixel the mouse is over. With mouseY and the observation we just made, we’re
now halfway there. We can translate our calculation of the first pixel
of each row to use mouseY: mouseY times 640 (the width of the row) will
always get us the value of the array corresponding to the first pixel in
the row.But what about all the other pixels? Now that we’ve figured out
what row a pixel is in how can we figure out how far to the left or
right that pixel is in the row? We need to take mouseX into account.Pick out a pixel in the middle of a row, say row 12. Imagine that
you clicked the mouse on a pixel somewhere in this row. We know that the
pixel’s position in the array must be greater than the first pixel in
that row. Since we count up as we move across rows, this pixel’s
position must be the position of the first pixel in its row plus the
number of pixels between the start of the row and this pixel. Well, we
happen to know the position of the first pixel on the previous row. It’s
just 12 times 640, the number of the row times the number of pixels in
each row. But what about the number of pixels to the left of the pixel
we’re looking at? Well, in mousePressed, we have a variable that tells us
exactly how far the mouse is from the left side of the sketch: mouseX. All we have to do is add mouseX to the value at the start of the row:
mouseY times 640.And, lo and behold, we now have our answer. The position in the
array of a given pixel will be mouseX + (mouseY
* 640). If at any point in this circuitous discussion you
happened to peek at the next line in mousePressed, you would have ruined the
surprise because look what that line does—performs this exact
calculation:int clickPosition = mouseX + (mouseY * 640);And then the line after that uses its result to access the array
of depthValues to pull out the value
at the point where the user clicked. That line uses clickPosition, the result of our calculation,
as an index to access the array. Just like int[] depthValues declared depthValues as an array—a box into which we
could put a lot of integers—depthValues[clickPosition] reaches into that
box and pulls out a particular integer. The value of clickPosition tells us how far to reach into
the box and which integer to pull out.Higher-Resolution Depth DataThat integer we found in the box is one of our new
higher-resolution depth values. As we’ve been working toward all this
time, it’s exactly the value that corresponds to the position in the
image where the user clicked. Once we’ve accessed it, we store it in
another variable clickedDepth and
use that to print it to Processing’s output window.If you haven’t already, run this sketch and click around on
various parts of the depth image. You’ll see values printing out to
the Processing output area much like they did in all of our previous
examples, only this time they’ll cover a different range. When I run
the sketch, I see values around 450 for the brightest parts of the
image (i.e., the closest parts of the scene) and around 8000 for the
darkest (i.e., farthest) parts. The parts of the image that are within
the Kinect’s minimum range or hidden in the shadows of closer images
give back readings of 0. That’s the Kinect’s way of saying that there
is no data available for those points.This is obviously a higher range than the pixel values of 0 to
255 we’d previously seen. In fact, it’s actually spookily close to the
0 to 8000 range we were hoping to see to cover the Kinect’s full
25-foot physical range at millimeter precision. This is extremely
promising for our overall project of trying to convert the Kinect’s
depth readings to accurate real-world measurements. In fact, it sounds
an awful lot like the values we’re pulling out of kinect.depthMap are the accurate distance
measurements in millimeters. In other words, each integer in our new
depth readings might actually correspond to a single millimeter of
physical distance.With a few alterations to our mousePressed function (and the use of a
handy tape measure) we can test out this hypothesis. Here’s
the new version of the code:import SimpleOpenNI.*;
SimpleOpenNI kinect;
void setup()
{
size(640, 480);
kinect = new SimpleOpenNI(this);
kinect.enableDepth();
}
void draw()
{
kinect.update();
PImage depthImage = kinect.depthImage();
image(depthImage, 0, 0);
}
void mousePressed(){
int[] depthValues = kinect.depthMap();
int clickPosition = mouseX + (mouseY * 640);
int millimeters = depthValues[clickPosition];
float inches = millimeters / 25.4;
println("mm: " + millimeters + " in: " + inches);
}First of all, I renamed our clickDepth variable to millimeters since our theory is that it
actually represents the distance from the Kinect to the object clicked
as measured in millimeters. Second, I went ahead and wrote another
line of code to convert our millimeter reading to inches. Being
American, I think in inches, so it helps me to have these units on
hand as well. A few seconds Googling taught me that to convert from
millimeters to inches, all you have to do is divide your value by
25.4. Finally, I updated the println statement to output both the
millimeter and inch versions of our measurement.Once I had this new code in place, I grabbed my tape measure. I put one end of it under the Kinect and
extended it toward myself, as you can see in Figure 2-13.Figure 2-13. I held up a tape measure in front of my Kinect to check our
depth measurements against the real world.The tape shows up as a black line because most of it is inside
of the Kinect’s minimum range and because all of it is reflective.
Once I had the tape measure extended, I locked it down at 32 inches
(or about 810 millimeters). Then I could use my free hand to click on
the depth image to print out measurements to the Processing output
area. It was a little bit hard to distinguish between my hand and the
tape measure itself, so I just clicked in that general vicinity. When
I did that, Processing printed out: mms: 806
in: 31.732285. Dead on! Taking into account the sag in the
measuring tape as well as my poorly aimed clicking, this is an
extremely accurate result. And more clicking around at different
distances confirmed it: our distance calculations lined up with the
tape measure every time. We’ve now turned out Kinect into an accurate
digital “tapeless” measuring tape!Try it out yourself. Get out a tape measure, run this sketch,
and double-check my results. Then, once you’ve convinced yourself that
it’s accurate, use the Kinect to take some measurements of your room,
your furniture, your pets, whatever you have handy.In this section, you learned two fundamental skills: how to
access the Kinect’s data as an array of values and how to translate
between that array and the position of a particular value in the
image.We’re now going to extend those skills to let us work with all
of the depth data coming from the Kinect. Instead of
translating from a single x-y coordinate to a single array index,
we’re going to loop through all of the values in the array in order to
make general conclusions about the depth data by comparing all of the
values. This technique will let us do things like finding and
tracking the closest part of the image. At that point,
we’ll be ready to use the Kinect to build our first real user
interfaces. We’ll be able to start doing something more interesting
than printing numbers in Processing’s output area.This chapter will conclude with a couple of projects that
explore some of the possibilities that are opened up by our ability to
track the closest point. We’ll write a sketch that lets us draw a line
by waving our hands and other body parts around. Then we’ll go even
further and make a sketch that lets us lay out photos by dragging them
around in midair Minority Report–style.
You are currently reading a PREVIEW of this book.
Get instant access to over
$1 million worth of books and videos.