Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

2. Working with the Depth Image > Project 4: A Wireless Tape Measure

Project 4: A Wireless Tape Measure

Project 4: A Wireless Tape MeasureThe code in this section is going to introduce a new programming concept that I haven’t mentioned before and that might not be completely familiar to you from your previous work in Processing: accessing arrays of pixels. Specifically, we’ll be learning how to translate between a one-dimensional array of pixels and the two-dimensional image that it represents. I’ll explain how to do all of the calculations necessary to access the entry in the array that corresponds to any location in the image. And, more important, I’ll explain how to think about the relationship between the image and the array so that these calculations are intuitive and easy to remember.At first, this discussion of pixels and arrays may feel like a diversion from our core focus on working with the Kinect. But the Kinect is, first and foremost, a camera, so much of our work with it will be based on processing its pixels.Up to this point, whenever we’ve wanted to access the Kinect’s pixels, we’ve first displayed them on the screen in the form of images. However, as we just discussed, this becomes impractical when we want to access the Kinect’s data at a higher resolution or when we want to access more of it than a single pixel’s worth. Therefore, we need to access the data in a manner that doesn’t first require us to display it on the screen. In Processing (and most other graphics programming environments) this image data is stored as arrays of pixels behind the scenes. Accessing these arrays of pixels directly (while they are still off stage) will let our programs run fast enough to do some really interesting things like working with the higher resolution data and processing more than one pixel at a time.Even though Processing stores images as flat arrays of pixels, we still want the ability to think of them two-dimensionally. We want to be able to figure out which pixel a user clicked on or draw something on the screen where we found a particular depth value. In this section, I’ll teach you how to make these kinds of translations. We’ll learn how to convert between the array the pixels are stored in and their position on the screen.In this section, I’ll introduce this conversion technique by showing you how to access an array of higher resolution values from the Kinect. Then we’ll use that to turn the Kinect into a wireless tape measure, converting these depth values into accurate real-world units. Once we’ve got this down, we’ll be ready to take things one step further and start working with all of the pixels coming from the Kinect.We’ll start our tape measure with a new version of our Processing sketch. This version will be along the same lines as the sketch we’ve been working with but with a few important differences. The basic structure will be the same. We’ll still display images from the Kinect and then output information about them when we click, but we’ll change what we display, both on the screen and in the output when we click.First of all, let’s forget about the color image from the Kinect. We’ve learned all that it has to teach us for now and so we’re banishing it to focus more on the depth image. Second, we’ll rewrite our mousePressed function to access and display the higher resolution depth data from the Kinect. I’ll explain how this works in some detail, but first take a look at the full code, noticing the changes to setup and draw that come from eliminating the color image:import SimpleOpenNI.*; SimpleOpenNI kinect; void setup() { size(640, 480); kinect = new SimpleOpenNI(this); kinect.enableDepth(); } void draw() { kinect.update(); PImage depthImage = kinect.depthImage(); image(depthImage, 0, 0); } void mousePressed(){ int[] depthValues = kinect.depthMap(); int clickPosition = mouseX + (mouseY * 640); int clickedDepth = depthValues[clickPosition]; float inches = clickedDepth / 25.4; println("inches: " + inches); }The changes to setup and draw are minimal: I’m no longer accessing the RGB image from the Kinect and no longer displaying it. And since we’re now only displaying one image, I made the whole sketch smaller, because we don’t need all that horizontal real estate just to show the depth image.Now, let’s talk about the real substantial difference here: the changes I’ve made to mousePressed. First of all, mousePressed calls a new function on our kinect object that we haven’t seen before: depthMap. This is one of a few functions that SimpleOpenNI provides that give us access to the higher resolution depth data. This is the simplest one. It returns all of the higher resolution depth values unmodified—neither converted or processed.In what form does kinect.depthMap return these depth values? Up until now, all the depth data we’ve seen has reached us in the form of images. We know that the higher-resolution values that kinect.depthMap returns can’t be stored as the pixels of an image. So, then, in what form are they stored? The answer is: as an array of integers. We have one integer for each depth value that the Kinect recorded, and they’re all stored in one array. That’s why the variable we use to save the results of kinect.depthMap is declared thusly: int[] depthValues. That int[] means that our depthValues variable will store an array of integers. If you have a hard time remembering how array declarations like this one work in Processing (as I often do), you can think of the square brackets as being a box that will contain all the values of the array and the int that comes before it as a label telling us that everything that goes in this box must be an integer.So, we have an array of integers. How can this box full of numbers store the same kind of information we’ve so far seen in the pixels of an image? The Kinect is, after all, a camera. The data that comes from it is two-dimensional, representing all the depth values in its rectangular field of view, whereas an array is one-dimensional, it can only store a single stack of numbers. How do you represent an image as a box full of numbers?Here’s how. Start with the pixel in the top-leftmost corner of the image. Put it in the box. Then, moving to the right along the top row of pixels, put each pixel into the box on top of the previous ones. When you get to the end of the row, jump back to left side of the image, move down one row, and repeat the procedure, continuing to stick the pixels from the second row on top of the ever-growing stack you began in the first row. Continue this procedure for each row of pixels in the image until you reach the very last pixel in the bottom right. Now, instead of a rectangular image, you’ll have a single stack of pixels: a one-dimensional array. All the pixels from each row will be stacked together, and the last pixel from each row will be right in front of the first pixel from the next row, as Figure 2-12 shows.Figure 2-12. Pixels in a two-dimensional image get stored as a flat array. Understanding how to split this array back into rows is key to processing images.This is exactly how the array returned by kinect.depthMap is structured. It has one high-resolution depth value for each pixel in the depth image. Remember that the depth image’s resolution is 640 by 480 pixels. That means that it has 480 rows of pixels each of which is 640 pixels across. So, from the logic above, we know that the array kinect.depthMap returns contains 307,200 (or 640 times 480) integers arranged in a single linear stack. The first integer in this stack corresponds to the top left pixel in the image. Each following value corresponds to the next pixel across each row until the last value finally corresponds to the last pixel in the bottom right.But how do we access the values of this array? More specifically, how do we pull out the integer value that corresponds to the part of the image that the user actually clicked on? This is the mousePressed event, after all, and so all we have available to us is the position of the mouse at the time that the user clicked. As we’ve seen, that position is expressed as an x-y coordinate in the variables mouseX and mouseY. In the past versions of the sketch, we used these coordinates to access the color value of a given pixel in our sketch using get, which specifically accepted x-y coordinates as its arguments. However, now we have a stack of integers in an array instead of a set of pixels arranged into a rectangle. Put another way, instead of having a set of x-y coordinates in two axes, we only have a single axis: the integers in our single array. To access data from the array, we need not a pair of x-y coordinates, but an index: a number that tells us the position in the array of the value we’re looking for. How do we translate from the two axes in which the depth image is displayed and the user interacts with to the single axis of our integer array? In other words, how do we convert mouseX and mouseY into the single position in the array that corresponds to the user’s click?To accomplish this, we’ll have to do something that takes into account how the values were put into the array in the first place. In filling the array, we started at the top-left corner of the image, went down each pixel in each row to the end adding values, and then jumped back to the beginning of the next row when we reached the edge of the image. Imagine that you were counting values as we did this, adding one to your count with each pixel that got converted into a value and added to the array. What would your count look like as we progressed through the image?For the first row, it’s pretty obvious. You’d start your count at 0 (programmers always start counting at 0) and work your way up as you go across the first row. When you reach the last pixel in the first row, your count will be 639 (there are 640 pixels in the row and you started counting at 0 for the first pixel). Then, you’d jump back to the left side of the image to continue on the second row and keep counting. So pixel one on row two would be 640, pixel two would be 641, and so on until you reach the end of row two. At the last pixel of row two, you’d be up to 1279, which means that the first pixel in row three would be 1280. If you continue for another row, you’d finish row three at 1919, and the first pixel of row four would be 1920.Notice how the first pixel of every row is always a multiple of 640? If I asked what the number would be for the first pixel in the 20th row in the image, instead of counting, you could just multiply: 640 times 20 is 12,800. In other words, the number for the first pixel in each row is the width of the image (i.e., 640) multiplied by which row we’re on (i.e., how far down we are from the top of the image).Let’s come back to our mousePressed function for a second. In that function, we happen to have a variable that’s always set to exactly how far down the mouse is from the top of the image: mouseY. Our goal is to translate from mouseX and mouseY to the number in our count corresponding to the pixel the mouse is over. With mouseY and the observation we just made, we’re now halfway there. We can translate our calculation of the first pixel of each row to use mouseY: mouseY times 640 (the width of the row) will always get us the value of the array corresponding to the first pixel in the row.But what about all the other pixels? Now that we’ve figured out what row a pixel is in how can we figure out how far to the left or right that pixel is in the row? We need to take mouseX into account.Pick out a pixel in the middle of a row, say row 12. Imagine that you clicked the mouse on a pixel somewhere in this row. We know that the pixel’s position in the array must be greater than the first pixel in that row. Since we count up as we move across rows, this pixel’s position must be the position of the first pixel in its row plus the number of pixels between the start of the row and this pixel. Well, we happen to know the position of the first pixel on the previous row. It’s just 12 times 640, the number of the row times the number of pixels in each row. But what about the number of pixels to the left of the pixel we’re looking at? Well, in mousePressed, we have a variable that tells us exactly how far the mouse is from the left side of the sketch: mouseX. All we have to do is add mouseX to the value at the start of the row: mouseY times 640.And, lo and behold, we now have our answer. The position in the array of a given pixel will be mouseX + (mouseY * 640). If at any point in this circuitous discussion you happened to peek at the next line in mousePressed, you would have ruined the surprise because look what that line does—performs this exact calculation:int clickPosition = mouseX + (mouseY * 640);And then the line after that uses its result to access the array of depthValues to pull out the value at the point where the user clicked. That line uses clickPosition, the result of our calculation, as an index to access the array. Just like int[] depthValues declared depthValues as an array—a box into which we could put a lot of integers—depthValues[clickPosition] reaches into that box and pulls out a particular integer. The value of clickPosition tells us how far to reach into the box and which integer to pull out.Higher-Resolution Depth DataThat integer we found in the box is one of our new higher-resolution depth values. As we’ve been working toward all this time, it’s exactly the value that corresponds to the position in the image where the user clicked. Once we’ve accessed it, we store it in another variable clickedDepth and use that to print it to Processing’s output window.If you haven’t already, run this sketch and click around on various parts of the depth image. You’ll see values printing out to the Processing output area much like they did in all of our previous examples, only this time they’ll cover a different range. When I run the sketch, I see values around 450 for the brightest parts of the image (i.e., the closest parts of the scene) and around 8000 for the darkest (i.e., farthest) parts. The parts of the image that are within the Kinect’s minimum range or hidden in the shadows of closer images give back readings of 0. That’s the Kinect’s way of saying that there is no data available for those points.This is obviously a higher range than the pixel values of 0 to 255 we’d previously seen. In fact, it’s actually spookily close to the 0 to 8000 range we were hoping to see to cover the Kinect’s full 25-foot physical range at millimeter precision. This is extremely promising for our overall project of trying to convert the Kinect’s depth readings to accurate real-world measurements. In fact, it sounds an awful lot like the values we’re pulling out of kinect.depthMap are the accurate distance measurements in millimeters. In other words, each integer in our new depth readings might actually correspond to a single millimeter of physical distance.With a few alterations to our mousePressed function (and the use of a handy tape measure) we can test out this hypothesis. Here’s the new version of the code:import SimpleOpenNI.*; SimpleOpenNI kinect; void setup() { size(640, 480); kinect = new SimpleOpenNI(this); kinect.enableDepth(); } void draw() { kinect.update(); PImage depthImage = kinect.depthImage(); image(depthImage, 0, 0); } void mousePressed(){ int[] depthValues = kinect.depthMap(); int clickPosition = mouseX + (mouseY * 640); int millimeters = depthValues[clickPosition]; float inches = millimeters / 25.4; println("mm: " + millimeters + " in: " + inches); }First of all, I renamed our clickDepth variable to millimeters since our theory is that it actually represents the distance from the Kinect to the object clicked as measured in millimeters. Second, I went ahead and wrote another line of code to convert our millimeter reading to inches. Being American, I think in inches, so it helps me to have these units on hand as well. A few seconds Googling taught me that to convert from millimeters to inches, all you have to do is divide your value by 25.4. Finally, I updated the println statement to output both the millimeter and inch versions of our measurement.Once I had this new code in place, I grabbed my tape measure. I put one end of it under the Kinect and extended it toward myself, as you can see in Figure 2-13.Figure 2-13. I held up a tape measure in front of my Kinect to check our depth measurements against the real world.The tape shows up as a black line because most of it is inside of the Kinect’s minimum range and because all of it is reflective. Once I had the tape measure extended, I locked it down at 32 inches (or about 810 millimeters). Then I could use my free hand to click on the depth image to print out measurements to the Processing output area. It was a little bit hard to distinguish between my hand and the tape measure itself, so I just clicked in that general vicinity. When I did that, Processing printed out: mms: 806 in: 31.732285. Dead on! Taking into account the sag in the measuring tape as well as my poorly aimed clicking, this is an extremely accurate result. And more clicking around at different distances confirmed it: our distance calculations lined up with the tape measure every time. We’ve now turned out Kinect into an accurate digital “tapeless” measuring tape!Try it out yourself. Get out a tape measure, run this sketch, and double-check my results. Then, once you’ve convinced yourself that it’s accurate, use the Kinect to take some measurements of your room, your furniture, your pets, whatever you have handy.In this section, you learned two fundamental skills: how to access the Kinect’s data as an array of values and how to translate between that array and the position of a particular value in the image.We’re now going to extend those skills to let us work with all of the depth data coming from the Kinect. Instead of translating from a single x-y coordinate to a single array index, we’re going to loop through all of the values in the array in order to make general conclusions about the depth data by comparing all of the values. This technique will let us do things like finding and tracking the closest part of the image. At that point, we’ll be ready to use the Kinect to build our first real user interfaces. We’ll be able to start doing something more interesting than printing numbers in Processing’s output area.This chapter will conclude with a couple of projects that explore some of the possibilities that are opened up by our ability to track the closest point. We’ll write a sketch that lets us draw a line by waving our hands and other body parts around. Then we’ll go even further and make a sketch that lets us lay out photos by dragging them around in midair Minority Report–style.

  

You are currently reading a PREVIEW of this book.

                                                                                        

Get instant access to over
$1 million worth of books and videos.

  

Start a Free Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint