Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

2. Working with the Depth Image > Project 2: Your First Kinect Program

Project 2: Your First Kinect Program

Project 2: Your First Kinect ProgramThat’s it for setup. Now we’re ready to start writing our own code. Our first program is going to be pretty simple. It’s just going to access the Kinect, read the images from both its depth camera and its color camera, and then display them both on the screen side by side. Once that’s working, we’ll gradually add to this program to explore the pixels of both images.You’ve got the Kinect library installed and your Kinect plugged into your computer, so launch Processing and run the program below. Read through it, run it, and take a look at what it displays. Spend some time waving your hands around in front of your Kinect (this, you’ll find, is one of the core activities that make up the process of Kinect development) and then, when you’re ready, meet me after the code listing. I’ll walk through each line of this first program and make sure you understand everything about how it works.import SimpleOpenNI.*; SimpleOpenNI kinect; void setup() { size(640*2, 480); kinect = new SimpleOpenNI(this); kinect.enableDepth(); kinect.enableRGB(); } void draw() { kinect.update(); image(kinect.depthImage(), 0, 0); image(kinect.rgbImage(), 640, 0); }When you run this sketch, you’ll have a cool moment that’s worth noting: your first time looking at a live depth image. Not to get too cheesy, but this is a bit of a landmark like the first time your parents or grandparents saw color television. This is your first experience with a new way of seeing, and it’s a cool sign that you’re living in the future! Shortly, we’ll go through this code line by line. I’ll explain each part of how it works and start introducing you to the SimpleOpenNI library we’ll be using to access the Kinect throughout this book.Minimum rangeAs I explained in Chapter 1, the Kinect’s depth camera has some limitations due to how it works. We’re seeing evidence of one of these here. The Kinect’s depth camera has a minimum range of about 20 inches. Closer than that, the Kinect can’t accurately calculate distances based on the displacement of the infrared dots. Since it can’t figure out an accurate depth, the Kinect just treats anything closer than this minimum range as if it had a depth value of 0—in other words, as if it was infinitely far away. That’s why my forearm shows up as black in the depth image—it’s closer than the Kinect’s minimum range.Noise at edgesFirst, what’s with splotches around the edges of my shoulders? Whenever you look at a moving depth image from the Kinect you’ll tend to see splotches of black appearing and disappearing at the edges of objects that should really be some solid shade of gray. This happens because the Kinect can only calculate depth where the dots from its infrared projector are reflected back to it. The edges of objects like my shoulders or the side of my face tend to deflect some of the dots away at odd angles so that they don’t actually make it back to the Kinect’s infrared camera at all. Where no IR dots reach the infrared camera, the Kinect can’t calculate the depth of the object and so, just like in the case of objects closer than 20 inches, there’s a hole in the Kinect’s data and the depth image turns black. We’ll see later on in the book that if we want to work around this problem, we can use the data from many depth images over time to smooth out the gaps in these edges. However, this method only works if we’ve got an object that’s sitting still.OBSERVATIONS ABOUT THE DEPTH IMAGEWhat do you notice when you look at the output from the Kinect? I’d like to point out a few observations that are worth paying attention to because they illustrate some key properties and limitations of the Kinect that you’ll have to understand to build effective applications with it. For reference, Figure 2-9 shows a screen capture of what I see when I run this app.Figure 2-9. A screen capture of our first Processing sketch showing the depth image side by side with a color image from the Kinect.What do you notice about this image besides my goofy haircut and awkward grin?First of all, look at the right side of the depth image, where my arm disappears off camera toward the Kinect. Things tend to get brighter as they come toward the camera: my shoulder and upper arm are brighter than my neck, which is brighter than the chair, which is much brighter than the distant kitchen wall. This makes sense. We know by now that the color of the pixels in the depth image represent how far away things are, with brighter things being closer and darker things farther away. If that’s the case, then why is my forearm, the thing in the image closest to the camera, black?There are some other parts of the image that also look black when we might not expect them to. While it makes sense that the back wall of the kitchen would be black as it’s quite far away from the Kinect, what’s with all the black splotches on the edges of my shoulders and on my shirt? And while we’re at it, why is the mirror in the top-left corner of the image so dark? It’s certainly not any farther away than the wall that it’s mounted on. And finally, what’s with the heavy dark shadow behind my head?I’ll answer these questions one at a time, as they each demonstrate an interesting aspect of depth images that we’ll see coming up constantly as we work with them throughout this book.Reflection causes distortionNext, why does the mirror look so weird? If you look at the color image, you can see that the mirror in the top left corner of the frame is just a thin slab of glass sitting on the wall. Why then does it appear so much darker than the wall it’s on? Instead of the wall’s even middle gray, the mirror shows up in the depth image as a thick band of full black and then, inside of that, a gradient that shifts from dark gray down to black again. What is happening here?Well, being reflective, the mirror bounces away the infrared dots that are coming from the Kinect’s projector. These then travel across the room until they hit some wall or other nonreflective surface. At that point, they bounce off, travel back to the mirror, reflect off of it, and eventually make their way to the Kinect’s infrared camera. This is exactly how mirrors normally work with visible light to allow you to see reflections. If you look at the RGB image closely, you’ll realize that the mirror is reflecting a piece of the white wall on the opposite side of the room in front of me.In the case of a depth image, however, there’s a twist. Since the IR dots were displaced farther, the Kinect calculates the depth of the mirror to be the distance between the Kinect and the mirror plus the distance between the mirror and the part of the room reflected in it. It’s like the portion of the wall reflected in the mirror had been picked up and moved so that it was actually behind the mirror instead of in front of it.This effect can be inconvenient at times when reflective surfaces show up accidentally in spaces you’re trying to map with the Kinect, for example windows and glass doors. If you don’t plan around them, these can cause strange distortions that can screw up the data from the Kinect and frustrate your plans. However, if you account for this reflective effect by getting the angle just right between the Kinect and any partially reflective surface, you can usually work around them without too much difficulty.Further, some people have actually taken advantage of this reflective effect to do clever things. For example, artist and researcher Kyle McDonald set up a series of mirrors similar to what you might see in a tailor’s shop around a single object, reflecting it so that all of its sides are visible simultaneously from the Kinect, letting him make a full 360 degree scan of the object all at once without having to rotate it or move it. Figure 2-10 shows Kyle’s setup and the depth image that results.Figure 2-10. Artist Kyle McDonald’s setup using mirrors to turn the Kinect into a 360 degree 3D scanner. Photos courtesy of Kyle McDonald.Occlusion and depth shadowsFinally, what’s up with that shadow behind my head? If you look at the depth image I captured you can see a solid black area to the left of my head, neck, and shoulder that looks like a shadow. But if we look at the color image, we see no shadow at all there. What’s going on? The Kinect’s projector shoots out a pattern of IR dots. Each dot travels until it reaches an object and then it bounces back to the Kinect to be read by the infrared camera and used in the depth calculation. But what about other objects in the scene that were behind that first object? No IR dots will ever reach those objects. They’re stuck in the closer object’s IR shadow. And since no IR dots ever reach them, the Kinect won’t get any depth information about them, and they’ll be another black hole in the depth image.This problem is called occlusion. Since the Kinect can’t see through or around objects, there will always be parts of the scene that are occluded or blocked from view and that we don’t have any depth data about. What parts of the scene will be occluded is determined by the position and angle of the Kinect relative to the objects in the scene.One useful way to think about occlusion is that the Kinect’s way of seeing is like lowering a very thin and delicate blanket over a complicated pile of objects. The blanket only comes down from one direction and if it settles on a taller object in one area, then the objects underneath that won’t ever make contact with the blanket unless they extend out from underneath the section of the blanket that’s touching the taller object. The blanket is like the grid of IR dots, only instead of being lowered onto an object, the dots are spreading out away from the Kinect to cover the scene.Misalignment between the color and depth imagesFinally, before we move on to looking more closely at the code, there’s one other subtle thing I wanted to point out about this example. Look closely at the depth image and the color image. Are they framed the same? In other words, do they capture the scene from exactly the same point of view? Look at my arm, for example. In the color image, it seems to come off camera to the right at the very bottom of the frame, not extending more than about a third of the way up. In the depth image, however, it’s quite a bit higher. My arm looks like it’s bent at a more dramatic angle and it leaves the frame clearly about halfway up. Now, look at the mirror in both images. A lot more of the mirror is visible in the RGB image than the depth image. It extends farther down into the frame and farther to the right. The visible portion of it is taller than it is wide. In the depth image on the other hand, the visible part of the mirror is nothing more than a small square in the upper-left corner.What is going on here? As we know from the introduction, the Kinect captures the depth image and the color image from two different cameras. These two cameras are separated from each other on the front of the Kinect by a couple of inches. Because of this difference in position, the two cameras will necessarily see slightly different parts of the scene, and they will see them from slightly different angles. This difference is a little bit like the difference between your two eyes. If you close each of your eyes one at a time and make some careful observations, you’ll notice similar types of differences of angle and framing that we’re seeing between the depth image and the color image.These differences between these two images are more than just a subtle technical footnote. As we’ll see later in the book, aligning the color and depth images, in other words overcoming the differences we’re observing here with code that takes them into account, allows us to do all kinds of cool things like automatically removing the background from the color image or producing a full-color three-dimensional scan of the scene. But that alignment is an advanced topic we won’t get into until later.Understanding the CodeNow that we’ve gotten a feel for the depth image, let’s take a closer look at the code that displayed it.I’m going to walk through each line of this example rather thoroughly. Since it’s our first time working with the Kinect library, it’s important for you to understand this example in as much detail as possible. As the book goes on and you get more comfortable with using this library, I’ll progress through examples more quickly, only discussing whatever is newest or trickiest. But the concepts in this example are going to be the foundation of everything we do throughout this book and we’re right at the beginning so, for now, I’ll go slowly and thoroughly through everything.On line 1 of this sketch, we start by importing the library:import SimpleOpenNI.*;This works just like importing any other Processing library and should be familiar to anyone who’s worked with Processing (if you’re new to Processing, check out Getting Started with Processing from O’Reilly). The library is called SimpleOpenNI because it’s a Processing wrapper for the OpenNI toolkit provided by PrimeSense that I discussed earlier. As a wrapper, SimpleOpenNI just makes the capabilities of OpenNI available in Processing, letting us write code that takes advantage of all of the powerful stuff PrimeSense has built into their framework. That’s why we had to install OpenNI and NITE as part of the setup process for working with this library: when we call our Processing code, the real heavy lifting is going to be done by OpenNI itself. We won’t have to worry about the details of that too frequently as we write our code, but it’s worth noting here at the beginning.The next line declares our SimpleOpenNI object and names it kinect:SimpleOpenNI kinect;This is the object we’ll use to access all of the Kinect’s data. We’ll call functions on it to get the depth and color images and, eventually, the user skeleton data as well. Here we’ve just declared it but not instantiated it, so that’s something we’ll have to look out for in the setup function.Now we’re into the setup function. The first thing we do here is declare the size of our app:void setup() { size(640*2, 480);I mentioned earlier that the images that come from the Kinect are 640 pixels wide by 480 tall. In this example, we’re going to display two images from the Kinect side by side: the depth image and the RGB image. Hence, we need an app that’s 480 pixels tall to match the Kinect’s images in height, but is twice as wide so it can contain two of them next to each other; that’s why we set the width to 640*2.Once that’s done, as promised earlier, we need to actually instantiate the SimpleOpenNI instance that we declared at the top of the sketch, which we do here:kinect = new SimpleOpenNI(this);Having that in hand, we then proceed to call two methods on our instance: enableDepth and enableRGB, and that’s the end of the setup function, so we close that out with a }: kinect.enableDepth(); kinect.enableRGB(); }These two methods are our way of telling the library that we will want to access both the depth image and the RGB image from the Kinect. Depending on our application, we might only want one, or even neither of these. By telling the library in advance what kind of data we’ll want to access, we give it a chance to do just enough work to provide us what we need. The library only has to ask the Kinect for the data we actually plan to use in our application and so it’s able to update faster, letting our app run faster and smoother in turn.At this point, we’re done setting up. We’ve created an object for accessing the Kinect, and we’ve told it that we’re going to want both the RGB data and the depth data. Now, let’s look at the draw loop to see how we actually access that data and do something with it.We kick off the draw loop by calling the update function on our Kinect object:void draw() { kinect.update();This tells the library to get fresh data from the Kinect so that we can work with it. It’ll pull in different data depending on which enable functions we called in setup; in our case, here that means we’ll now have fresh depth and RGB images to work with.We’re down to the last two lines, which are the heart of this example. Let’s take the first one:image(kinect.depthImage(), 0, 0);Starting from the inside out, we first call kinect.depthImage, which asks the library for the most recently available depth image. This image is then handed to Processing’s built-in image function along with two other arguments both set to 0. This tells processing to draw the depth image at 0,0 in our sketch, or at the very top left of our app’s window.The next line does nearly the same exact thing except with two important differences: image(kinect.rgbImage(), 640, 0); }It calls kinect.rgbImage to get the color image from the Kinect and it passes 640,0 to image instead of 0,0, which means that it will place the color image at the top of the app’s window, but 640 pixels from the left side. In other words, the depth image will occupy the leftmost 640 pixels in our app and the color image the rightmost ones.FRAME RATESThe Kinect camera captures data at a rate of 30 frames per second. In other words, every 1/30 of a second, the Kinect makes a new depth and RGB image available for us to read. If our app runs faster than 30 frames a second, the draw function will get called multiple times before a new set of depth and RGB images is available from the Kinect. If our app runs slower than 30 frames a second, we’ll miss some images. But how fast does our app actually run? What is our frame rate? The answer is that we don’t know. By default, Processing tries to run our draw function 60 times per second. You can change this target by calling Processing’s frameRate function and passing it the frame rate at which you’d like your sketch to run. However, in practice, the actual frame rate of your sketch will depend on what your sketch is actually doing. How long each run of the draw function takes depends on a lot of factors including what we’re asking it to do and how much of our computer’s resources are available for Processing to use. For example, if we had an ancient really slow computer and we were asking Processing to print out every word of Dickens’ A Tale of Two Cities on every run of the draw function, we’d likely have a very low frame rate. On the other hand, when running Processing on a typical modern computer with a draw loop that only does some basic operations, we might have a frame rate significantly above 30 frames per second. And further, in either of these situations, our frame rate might vary over time both as our app’s level of exertion varied with user input and the resources available to it varied with what else was running on our computer.For now in these beginning examples, you won’t have to worry too much about the frame rate, but as we start to build more sophisticated applications, this will be a constant concern. If we try to do too much work on each run of our draw function, our interactions may get slow and jerky, but if we’re clever, we’ll be able to keep all our apps just as smooth as this initial example.One more note about how these lines work. By calling kinect.depthImage and kinect.rgbImage inline within the arguments to image we’re hiding one important part of how these functions work together: we’re never seeing the return value from kinect.depthImage or kinect.rgbImage. This is an elegant and concise way to write a simple example like this, but right now we’re trying for understanding rather than elegance, so we might learn something by rewriting our examples like this:import SimpleOpenNI.*; SimpleOpenNI kinect; void setup() { // double the width to display two images side by side size(640*2, 480); kinect = new SimpleOpenNI(this); kinect.enableDepth(); kinect.enableRGB(); } void draw() { kinect.update(); PImage depthImage = kinect.depthImage(); PImage rgbImage = kinect.rgbImage(); image(depthImage, 0, 0); image(rgbImage, 640, 0); }In this altered example, we’ve introduced two new lines to our sketch’s draw function. Instead of implicitly passing the return values from kinect.depthImage and kinect.rgbImage to Processing’s image function, we’re now storing them in local variables and then passing those variables to image. This has not changed the functionality of our sketch at all, and if you run it, you’ll see no difference in the behavior. What it does is make the return type of our two image-accessing functions explicit: both kinect.depthImage and kinect.rgbImage return a PImage, Processing’s class for storing image data. This class provides all kinds of useful functions for working with images such as the ability to access the image’s individual pixels and to alter them, something we’re going to be doing later on in this chapter. Having the Kinect data in the form of a PImage is also a big advantage because it means that we can automatically use the Kinect data with other libraries that don’t know anything at all about the Kinect but do know how to process PImages.

  

You are currently reading a PREVIEW of this book.

                                                                                        

Get instant access to over
$1 million worth of books and videos.

  

Start a Free Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint