Skip to content

Object Pose Estimation and Tracking

Object pose estimation

Object pose estimation is a computer vision technique to estimate the 3D pose of the real-life object to the camera. The object pose is hereby estimated based on the geometry and the visual appearance of the object.

To perform a smooth tracking of the 3D pose of the object, the Track framework uses a 2-step approach. The first stage is called Detection , which is to find the object and its 3D pose in the camera image. The second stage is called Tracking , which finds accurate 3D pose of the objects in real-time, given an initial pose estimate.


To determine whether the object is in the camera image we need to search for the characteristics of our objects present in the camera image. Therefore, the information of our objects must be known prior to usage. The information of our AR objects can be prepared by using the VIRNECT Track Target Trainer, which extracts important information of the objects, such as shapes and colors, for accurate and stable pose estimation and tracking.

To detect the object in the camera image, the algorithm searches for geometric information in the camera image, and then attempts to match them with the object's information. Once there is a successful match, the algorithm estimates the 3D pose of the objects.


After successful detection, the Track framework uses tracking technologies to refine the 3D pose. In the detection stage, we searched for important information in the entire image. On the contrary, the tracking stage only searches for the information within the nearby region of the object shown in the previous frame. Therefore the 3D pose can be estimated much faster.

Tracking technology also ensures that the object can be tracked, even under difficult conditions. The Track framework comprises robust tracking technologies that can endure occlusion, illumination change, and fast movements.