Sensing position for Augmented Reality applications

This year I attended SIGGRAPH, the largest academic conference for computer graphics research, in Los Angeles. Of particular interest to me was the Augmented and Mixed Reality session in the “Birds of a Feather” side event [1]. In response to that session, I’m going to summarize my experiences in accurate position tracking for augmented reality applications.

Augmented reality is defined by Ronald Azuma in 1997. We can define augmented reality as the overlay of a digital world into the physical world. This augmented reality is accessed by humans visually through head-mounted AR and handheld (lens) AR. Head-mounted AR includes devices that show a video feed to the human, where the video is captured from the real world, or a semi-transparent display that draws video on top of what a person normally sees. Handheld AR involves looking through a mobile device, such as a smartphone, to see into the augmented reality.

One of the problems in augmented reality is obtaining an accurate position of a person and moveable objects. I am aware of several approaches for obtaining position relative to a ground plane in a confined environment: simple computer vision, distance-sensing cameras, fiducial markers, and motion tracking systems.

In simple computer vision, we have a few tracked objects in the environment. We attach a single LED to each object and look at them from above to compute coordinates on the ground plane. We don’t get elevation, orientation, or identity from the single-point LED. Bill Nye’s AR games take this approach.

In distance-sensing cameras, we assume that an object stands above the ground plane. Using a special type of camera such as Microsoft Kinect, a computer is able to see the distance between the lens and an object’s nearest surface. The camera can be oriented facing downwards and placed on a ceiling to compute a heightmap relative to a ground plane. Then, places of change in the heightmap or differences from a “nothing there” heightmap identifies an object’s position. Other information such as orientation and identity, though, cannot be determined.

In fiducial marker tracking, many objects can be tracked. Fiducial markers are black-and-white tags with a visible printed bit pattern inside a black box. The markers are attached to objects, oriented to face the camera, and captured by overhead cameras facing down. Computer processing, such as ARToolkit or Igarashi’s custom marker library, are used to identify a tag’s identity, position, orientation, and distance (i.e., elevation). I employed this technique in my Augmented Reality publication to teach sequential tasks to a robot.

In motion tracking systems, we obtain information about an object’s identity, 3D position, and orientation in a confined space. Motion tracking systems involve attaching many reflective/IR dots to an object, and these special dots are seen by several cameras looking into the confined space. A computer knows the arrangement of the cameras and dot arrangements to compute the desired information. Companies such as OptiTrack and Vicon supply these motion tracking systems, through their primary application is for Hollywood: human motion is recorded to animate digital characters. We can apply the same idea to compute object positions for AR applications.

Given the four approaches, simple computer vision and distance-sensing cameras provide very similar information for identifying object position. They don’t provide orientation but that may be sufficient for many applications; or, that information can be obtained using other sensors. Fiducial marker tracking provides much more information, but due to lighting condition variances, we don’t get accurate tracking and have to fudge the lighting conditions. Motion tracking provides the best sensing overall, but the cost of getting equipment is significantly higher than the other three approaches, which can be built using off-the-shelf components.

These four approaches enable a developer to track position, a step in building AR applications. If the AR application is used outdoors, a developer can also obtain off-the-shelf smartphones with GPS and compasses to build AR applications without the hassle of setting up a special environment. Identity, position, orientation, and altitude are obtained, albeit at a coarse level, but it might suffice for an AR project.

If you have other approaches for identifying object position, let me know and I’ll write about it.

[1] SIGGRAPH’s main events are research talks, exhibition, job fair, industry talks, computer animation festival, student poster session, etc.