Project 2: Your First Kinect ProgramThat’s it for setup. Now we’re ready to start writing our own
code. Our first program is going to be pretty simple. It’s just going to
access the Kinect, read the images from both its depth camera and its color camera, and then display them
both on the screen side by side. Once that’s working, we’ll gradually
add to this program to explore the pixels of both images.You’ve got the Kinect library installed and your Kinect plugged into your
computer, so launch Processing and run the program below. Read through
it, run it, and take a look at what it displays. Spend some time waving
your hands around in front of your Kinect (this, you’ll find, is one of
the core activities that make up the process of Kinect development) and
then, when you’re ready, meet me after the code listing. I’ll walk
through each line of this first program and make sure you understand
everything about how it works.import SimpleOpenNI.*;
SimpleOpenNI kinect;
void setup()
{
size(640*2, 480);
kinect = new SimpleOpenNI(this);
kinect.enableDepth();
kinect.enableRGB();
}
void draw()
{
kinect.update();
image(kinect.depthImage(), 0, 0);
image(kinect.rgbImage(), 640, 0);
}When you run this sketch, you’ll have a cool moment that’s worth
noting: your first time looking at a live depth image. Not to get too cheesy, but this is a bit of a
landmark like the first time your parents or grandparents saw color
television. This is your first experience with a new way of seeing, and
it’s a cool sign that you’re living in the future! Shortly, we’ll go
through this code line by line. I’ll explain each part of how it works
and start introducing you to the SimpleOpenNI library we’ll be using to access
the Kinect throughout this book.Minimum rangeAs I explained in Chapter 1, the
Kinect’s depth camera has some limitations due to how it works.
We’re seeing evidence of one of these here. The Kinect’s depth
camera has a minimum range of about 20 inches. Closer than that,
the Kinect can’t accurately calculate distances based on the
displacement of the infrared dots. Since it can’t figure out an
accurate depth, the Kinect just treats anything closer than this
minimum range as if it had a depth value of 0—in other words, as
if it was infinitely far away. That’s why my forearm shows up as
black in the depth image—it’s closer than the Kinect’s minimum
range.Noise at edgesFirst, what’s with splotches around the edges of my
shoulders? Whenever you look at a moving depth image from the
Kinect you’ll tend to see splotches of black appearing and
disappearing at the edges of objects that should really be some
solid shade of gray. This happens because the Kinect can only
calculate depth where the dots from its infrared projector are
reflected back to it. The edges of objects like my shoulders or
the side of my face tend to deflect some of the dots away at odd
angles so that they don’t actually make it back to the Kinect’s
infrared camera at all. Where no IR dots reach the infrared
camera, the Kinect can’t calculate the depth of the object and so,
just like in the case of objects closer than 20 inches, there’s a
hole in the Kinect’s data and the depth image turns black. We’ll
see later on in the book that if we want to work around this
problem, we can use the data from many depth images over time to smooth out the gaps in
these edges. However, this method only works if we’ve got an
object that’s sitting still.OBSERVATIONS ABOUT THE DEPTH IMAGEWhat do you notice when you look at the output from the
Kinect? I’d like to point out a few observations that are worth
paying attention to because they illustrate some key properties
and limitations of the Kinect that you’ll have to understand to
build effective applications with it. For reference, Figure 2-9 shows a
screen capture of what I see when I run this app.Figure 2-9. A screen capture of our first Processing sketch showing
the depth image side by side with a color image from the
Kinect.What do you notice about this image besides my goofy
haircut and awkward grin?First of all, look at the right side of the depth image,
where my arm disappears off camera toward the Kinect. Things
tend to get brighter as they come toward the camera: my shoulder
and upper arm are brighter than my neck, which is brighter than
the chair, which is much brighter than the distant kitchen wall.
This makes sense. We know by now that the color of the pixels in
the depth image represent how far away things are, with brighter
things being closer and darker things farther away. If that’s
the case, then why is my forearm, the thing in the image closest
to the camera, black?There are some other parts of the image that also look
black when we might not expect them to. While it makes sense
that the back wall of the kitchen would be black as it’s quite
far away from the Kinect, what’s with all the black splotches on
the edges of my shoulders and on my shirt? And while we’re at
it, why is the mirror in the top-left corner of the image so
dark? It’s certainly not any farther away than the wall that
it’s mounted on. And finally, what’s with the heavy dark shadow
behind my head?I’ll answer these questions one at a time, as they each
demonstrate an interesting aspect of depth images that we’ll see
coming up constantly as we work with them throughout this
book.Reflection causes distortionNext, why does the mirror look so weird? If you look at the
color image, you can see that the mirror in the top left corner of
the frame is just a thin slab of glass sitting on the wall. Why
then does it appear so much darker than the wall it’s on? Instead
of the wall’s even middle gray, the mirror shows up in the depth
image as a thick band of full black and then, inside of that, a
gradient that shifts from dark gray down to black again. What is
happening here?Well, being reflective, the mirror bounces away the infrared
dots that are coming from the Kinect’s projector. These then
travel across the room until they hit some wall or other
nonreflective surface. At that point, they bounce off, travel back
to the mirror, reflect off of it, and eventually make their way to
the Kinect’s infrared camera. This is exactly how mirrors normally work with visible light to allow
you to see reflections. If you look at the RGB image closely,
you’ll realize that the mirror is reflecting a piece of the white
wall on the opposite side of the room in front of me.In the case of a depth image, however, there’s a twist. Since the IR
dots were displaced farther, the Kinect calculates the depth of
the mirror to be the distance between the Kinect and the mirror
plus the distance between the mirror and the part of the room
reflected in it. It’s like the portion of the wall reflected in
the mirror had been picked up and moved so that it was actually
behind the mirror instead of in front of it.This effect can be inconvenient at times when reflective
surfaces show up accidentally in spaces you’re trying to map with
the Kinect, for example windows and glass doors. If you don’t plan
around them, these can cause strange distortions that can screw up
the data from the Kinect and frustrate your plans. However, if you
account for this reflective effect by getting the angle just right between the Kinect and
any partially reflective surface, you can usually work around them
without too much difficulty.Further, some people have actually taken advantage of this
reflective effect to do clever things. For example, artist and
researcher Kyle McDonald set up a series of mirrors similar to
what you might see in a tailor’s shop around a single object,
reflecting it so that all of its sides are visible simultaneously
from the Kinect, letting him make a full 360 degree scan of the
object all at once without having to rotate it or move it. Figure 2-10 shows Kyle’s
setup and the depth image that results.Figure 2-10. Artist Kyle McDonald’s setup using mirrors to turn the
Kinect into a 360 degree 3D scanner. Photos courtesy of Kyle
McDonald.Occlusion and depth shadowsFinally, what’s up with that shadow behind my head? If you
look at the depth image I captured you can see a solid black area
to the left of my head, neck, and shoulder that looks like a
shadow. But if we look at the color image, we see no shadow at all
there. What’s going on? The Kinect’s projector shoots out a
pattern of IR dots. Each dot travels until it reaches an object
and then it bounces back to the Kinect to be read by the infrared
camera and used in the depth calculation. But what about other
objects in the scene that were behind that first object? No IR
dots will ever reach those objects. They’re stuck in the closer
object’s IR shadow. And since no IR dots ever reach them, the
Kinect won’t get any depth information about them, and they’ll be
another black hole in the depth image.This problem is called occlusion. Since
the Kinect can’t see through or around objects, there will always
be parts of the scene that are occluded or blocked from view and
that we don’t have any depth data about. What parts of the scene
will be occluded is determined by the position and angle of the
Kinect relative to the objects in the scene.One useful way to think about occlusion is that the Kinect’s
way of seeing is like lowering a very thin and delicate blanket
over a complicated pile of objects. The blanket only comes down
from one direction and if it settles on a taller object in one
area, then the objects underneath that won’t ever make contact
with the blanket unless they extend out from underneath the
section of the blanket that’s touching the taller object. The
blanket is like the grid of IR dots, only instead of being lowered
onto an object, the dots are spreading out away from the Kinect to
cover the scene.Misalignment between the color and depth
imagesFinally, before we move on to looking more closely at the
code, there’s one other subtle thing I wanted to point out about
this example. Look closely at the depth image and the color image.
Are they framed the same? In other words, do they capture the
scene from exactly the same point of view? Look at my arm, for
example. In the color image, it seems to come off camera to the
right at the very bottom of the frame, not extending more than
about a third of the way up. In the depth image, however, it’s
quite a bit higher. My arm looks like it’s bent at a more dramatic
angle and it leaves the frame clearly about halfway up. Now, look
at the mirror in both images. A lot more of the mirror is visible
in the RGB image than the depth image. It extends farther down
into the frame and farther to the right. The visible portion of it
is taller than it is wide. In the depth image on the other hand,
the visible part of the mirror is nothing more than a small square
in the upper-left corner.What is going on here? As we know from the introduction, the
Kinect captures the depth image and the color image from two
different cameras. These two cameras are separated from each other
on the front of the Kinect by a couple of inches. Because of this
difference in position, the two cameras will necessarily see
slightly different parts of the scene, and they will see them from
slightly different angles. This difference is a little bit like
the difference between your two eyes. If you close each of your
eyes one at a time and make some careful observations, you’ll
notice similar types of differences of angle and framing that
we’re seeing between the depth image and the color image.These differences between these two images are more than
just a subtle technical footnote. As we’ll see later in the book,
aligning the color and depth images, in other words overcoming the
differences we’re observing here with code that takes them into
account, allows us to do all kinds of cool things like
automatically removing the background from the color image or
producing a full-color three-dimensional scan of the scene. But
that alignment is an advanced topic we won’t get into until
later.Understanding the CodeNow that we’ve gotten a feel for the depth image, let’s take a closer look at the code that
displayed it.I’m going to walk through each line of this example rather
thoroughly. Since it’s our first time working with the Kinect
library, it’s important for you to understand this
example in as much detail as possible. As the book goes on and you get
more comfortable with using this library, I’ll progress through examples more
quickly, only discussing whatever is newest or trickiest. But the
concepts in this example are going to be the foundation of everything
we do throughout this book and we’re right at the beginning so, for
now, I’ll go slowly and thoroughly through everything.On line 1 of this sketch, we start by importing the
library:import SimpleOpenNI.*;This works just like importing any other Processing library and
should be familiar to anyone who’s worked with Processing (if you’re
new to Processing, check out Getting Started with Processing from
O’Reilly). The library is called
SimpleOpenNI because it’s a Processing
wrapper for the OpenNI toolkit provided by PrimeSense that I discussed
earlier. As a wrapper, SimpleOpenNI just makes the capabilities of
OpenNI available in Processing, letting us write code that takes
advantage of all of the powerful stuff PrimeSense has built into their
framework. That’s why we had to install OpenNI and NITE as part of the
setup process for working with this library: when we call our
Processing code, the real heavy lifting is going to be done by OpenNI
itself. We won’t have to worry about the details of that too
frequently as we write our code, but it’s worth noting here at the
beginning.The next line declares our SimpleOpenNI object and names it
kinect:SimpleOpenNI kinect;This is the object we’ll use to access all of the Kinect’s data.
We’ll call functions on it to get the depth and color images and,
eventually, the user skeleton data as well. Here we’ve just declared
it but not instantiated it, so that’s something we’ll have to look out
for in the setup function.Now we’re into the setup
function. The first thing we do here is declare the size of our
app:void setup()
{
size(640*2, 480);I mentioned earlier that the images that come from the Kinect
are 640 pixels wide by 480 tall. In this example, we’re going to
display two images from the Kinect side by side: the depth image and
the RGB image. Hence, we need an app that’s 480 pixels tall to match
the Kinect’s images in height, but is twice as wide so it can contain
two of them next to each other; that’s why we set the width to
640*2.Once that’s done, as promised earlier, we need to actually
instantiate the SimpleOpenNI
instance that we declared at the top of the sketch, which we do
here:kinect = new SimpleOpenNI(this);Having that in hand, we then proceed to call two methods on our
instance: enableDepth and enableRGB, and that’s the end of the
setup function, so we close that
out with a }: kinect.enableDepth();
kinect.enableRGB();
}These two methods are our way of telling the library that we will want to access both the depth image and the RGB image from the Kinect. Depending
on our application, we might only want one, or even neither of these.
By telling the library in advance what kind of data we’ll want to
access, we give it a chance to do just enough work to provide us what
we need. The library only has to ask the Kinect for the data we
actually plan to use in our application and so it’s able to update faster, letting our app run faster and smoother
in turn.At this point, we’re done setting up. We’ve created an object
for accessing the Kinect, and we’ve told it that we’re going to want
both the RGB data and the depth data. Now, let’s look at the draw loop to see how we actually access that
data and do something with it.We kick off the draw loop by
calling the update function on our
Kinect object:void draw()
{
kinect.update();This tells the library to get fresh data from the Kinect so that
we can work with it. It’ll pull in different data depending on which
enable functions we called in
setup; in our case, here that means
we’ll now have fresh depth and RGB images to work with.We’re down to the last two lines, which are the heart of this
example. Let’s take the first one:image(kinect.depthImage(), 0, 0);Starting from the inside out, we first call kinect.depthImage, which asks the library
for the most recently available depth image. This image is then handed
to Processing’s built-in image
function along with two other arguments both set to 0. This tells
processing to draw the depth image at 0,0 in our sketch, or at the very top left
of our app’s window.The next line does nearly the same exact thing except with two
important differences: image(kinect.rgbImage(), 640, 0);
}It calls kinect.rgbImage to
get the color image from the Kinect and it passes 640,0 to image instead of 0,0, which means that it will place the
color image at the top of the app’s window, but 640 pixels from the
left side. In other words, the depth image will occupy the leftmost
640 pixels in our app and the color image the rightmost ones.FRAME RATESThe Kinect camera captures data at a rate of 30 frames per
second. In other words, every 1/30 of a second, the Kinect makes a
new depth and RGB image available for us to read. If our
app runs faster than 30 frames a second, the draw function will get called multiple
times before a new set of depth and RGB images is available from the
Kinect. If our app runs slower than 30 frames a second, we’ll miss
some images. But how fast does our app actually run? What is our
frame rate? The answer is that we don’t know. By default, Processing
tries to run our draw function 60
times per second. You can change this target by calling Processing’s
frameRate function and passing it
the frame rate at which you’d like your sketch to run. However, in
practice, the actual frame rate of your sketch will depend on what
your sketch is actually doing. How long each run of the draw function takes depends on a lot of
factors including what we’re asking it to do and how much of our
computer’s resources are available for Processing to use. For
example, if we had an ancient really slow computer and we were
asking Processing to print out every word of Dickens’ A
Tale of Two Cities on every run of the draw function, we’d likely have a very low
frame rate. On the other hand, when running Processing on a typical
modern computer with a draw loop
that only does some basic operations, we might have a frame rate
significantly above 30 frames per second. And further, in either of
these situations, our frame rate might vary over time both as our
app’s level of exertion varied with user input and the resources
available to it varied with what else was running on our
computer.For now in these beginning examples, you won’t have to worry
too much about the frame rate, but as we start to build more
sophisticated applications, this will be a constant concern. If we
try to do too much work on each run of our draw function, our interactions may get
slow and jerky, but if we’re clever, we’ll be able to keep all our
apps just as smooth as this initial example.One more note about how these lines work. By calling kinect.depthImage and kinect.rgbImage inline within the arguments
to image we’re hiding one important
part of how these functions work together: we’re never seeing the
return value from kinect.depthImage
or kinect.rgbImage. This is an
elegant and concise way to write a simple example like this, but right
now we’re trying for understanding rather than elegance, so we might
learn something by rewriting our examples like this:import SimpleOpenNI.*;
SimpleOpenNI kinect;
void setup()
{
// double the width to display two images side by side
size(640*2, 480);
kinect = new SimpleOpenNI(this);
kinect.enableDepth();
kinect.enableRGB();
}
void draw()
{
kinect.update();
PImage depthImage = kinect.depthImage();
PImage rgbImage = kinect.rgbImage();
image(depthImage, 0, 0);
image(rgbImage, 640, 0);
}In this altered example, we’ve introduced two new lines to our
sketch’s draw function. Instead of
implicitly passing the return values from kinect.depthImage and kinect.rgbImage to Processing’s image function, we’re now storing them in
local variables and then passing those variables to image. This has not changed the
functionality of our sketch at all, and if you run it, you’ll see no
difference in the behavior. What it does is make the return type of
our two image-accessing functions explicit: both kinect.depthImage and kinect.rgbImage return a PImage, Processing’s class for storing image
data. This class provides all kinds of useful functions for working
with images such as the ability to access the image’s individual
pixels and to alter them, something we’re going to be
doing later on in this chapter. Having the Kinect data in the form of
a PImage is also a big advantage
because it means that we can automatically use the Kinect data with
other libraries that don’t know anything at all about the Kinect but
do know how to process PImages.
You are currently reading a PREVIEW of this book.
Get instant access to over
$1 million worth of books and videos.