handheld augmented reality

Augmented Reality Anywhere and Anytime   


   Social Augmented Reality

   Information Presentation

   Real-Time Self Localization

   AR Graphics

   AR Navigation

   Augmented Reality Games

   Past Projects



   Hybrid SLAM

   Panorama SLAM

   Planar SLAM

   Model-Based Tracking

   Marker-Based Tracking


Software Libraries

   Studierstube ES

   Studierstube Tracker










   Student projects


   Site map


The Handheld Augmented Reality
Project is supported by the
following institutions:



Christian Doppler Forschungsgesellschaft


Graz University of Technology




Come to ISMAR 2011



Full 6-DOF Localization Framework


Our localization framework is based on sparse 3D point clouds and our latest achievements in natural feature tracking.

The environment is captured using a high quality DSLR camera, and sparse 3D point clouds are created through a 3D reconstruction pipeline. The resulting reconstructions are investigated with respect to visibility conditions from realistic view points, and the datasets are organized in more compact blocks of data, which can be efficiently handled. Additionally, an efficient search structure for the individual feature blocks is built to allow for fast and efficient indexing. The entire dataset is stored on a remote server which is connected to a spatial database.


3D sparse reconstruction of a single room.


3D sparse reconstruction of the city center of Graz, Austria.


Given an approximate position of a mobile phone (by GPS coordinates or WIFI triangulation), the server sends the necessary data content to the client on request. The mobile phone application is provided with the necessary data for instant calculation of a full 6DOF pose given the actual input image from the mobile phone camera.


The features from the 2D input image are matched against the dataset of fully globally registered 3D points, and a robust 3-point pose algorithm is used to robustly find the correct pose.


Workflow during the offline and the online stage of our framework. In the offline stage sparse feature
models are calculated, which are then partitioned into individual blocks considering visibility contraints. The
mobile data structure is created and transfered to a mobile client on request. The mobile client can then
robustly calculate a full 6DOF pose.


Due to the limited field of view of the mobile phone camera, attaching an additional wide angle lens to the camera is advisable. Especially in indoor environments a large field of view is essential for a good performance of the approach.


Color-coded path through three reconstructed rooms in our department, starting in the right room (red)
walking through the middle room (pink) into the right room (cyan) and back into the right room (green).



To overcome the narrow field of view of standard cameras, in earlier work we used a wide angle lens attached to the mobile phone. In this work we use a special algorithm for generating panoramic images in real-time (see Figure below). The details about the mapping and tracking algorithm can be found here.


A typical panoramic image shot from a mobile phone without any exposure change compensation.


While the panoramic image is created, the features are matched against the database in the background, followed by a robust pose estimation step. The flowchart of the system is depicted in the Figure below. A part of the reconstruction is retrieved from the server given an approximate GPS position. The feature extraction, feature matching and pose estimation steps are integrated as background tasks while new pixels are mapped into the panoramic image from the live video feed. As soon as the localization succeeds, a full 6-DOF pose is obtained.


Flowchart of the localization system.


The result of the localization procedure is shown in the Figure below. The inliers are spread around the entire 360-degree panorama. The rays from the camera center directly hit the corresponding 3D points in the environment, passing through their projection in the image (i.e. the textured cylinder).


Localization result for a full 360-degree panoramic image. The yellow lines indicate the inliers used for establishing the pose estimate.


A path through the city center of Graz, Austria is shown in the Figure below. Around 230 full panoramic images were acquired using an omnidirectional PointGrey Ladybug camera. A full noise-free 360-degree panorama is the ideal case for our localization procedure obviously. The localization also works with distorted and noisy images, however. We conclude that an increase of the field of view directly infers an increasing success rate and also an improved accuracy of the localization results.


Color-coded path through the city center of Graz, Austria. The path starts in the lower left area of the image and ends in the upper left area. Since parts of the reconstruction from the square in the right area is missing, the localization fails in this area. As soon as enough environmental area is available as reconstruction again, the localization approach succeeds again.


A video describing our approach in short can be found here.



We used our system presented at ISMAR 2011 and added sensor support for feature matching during localization. In the offline stage we additionally consider the upright direction (i.e. gravity) and the feature normal direction with respect to north to partition the features into separate bins.


Flowchart of the new system with the new blocks marked in green.


During online operation the sensors are used to slice the panoramic image into tiles which correspond to different geographic orientations. The features in the image slice corresponding to a specific geographic orientation are matched against the previously binned features from the reconstruction.


From left to right: considering gravity for orientation assignment, orientation-aware feature binning and orientation-aware matching during online operation.


Below some snapshots from the video we recorded live on the iPhone 4S. The full video can be found here.


Snapshots from our submission video.




Inspired by the MS Read-Write idea we investigated the idea to perform 6DOF self-localization without the need for a full 3D reconstruction. The idea was investigated earlier by Zhang and Kosecka in 2006. We revisit this idea with a more close evaluation using a Differential-GPS setup and the use of homographies and epipolar geometry.


Localization approach avoiding a full 3D reconstruction.


The results encourage further investigation of this idea, since it opens a completely new idea of performing localization and is advantageous in terms of database management and several other issues.


Current Project Status

The current systems already contains all parts for reconstruction, registration and alignment. The algorithms can be used to create reconstructions for indoor or outdoor scenarios. At the moment we consider the generation of multiple reconstructions using an omnidirectional camera to allow for different investigations, mainly concerning feature stability under different weather and lighting conditions.

Project Team

Clemens Arth
Gerhard Reitmayr
Dieter Schmalstieg


Manfred Klopschitz
Arnold Irschara
Daniel Wagner



Real-Time Self-Localization from Panoramic Images on Mobile Devices

Authors: Clemens Arth, Manfred Klopschitz, Gerhard Reitmayr, Dieter Schmalstieg

Details: International Symposium on Mixed and Augmented Reality (ISMAR), 26-29 Oct. 2011

Self-localization in large environments is a vital task for accurately registered information visualization in outdoor Augmented Reality (AR) applications. In this work, we present a system for selflocalization on mobile phones using a GPS prior and an onlinegenerated panoramic view of the user's environment. The approach is suitable for executing entirely on current generation mobile devices, such as smartphones. Parallel execution of online incremental panorama generation and accurate 6DOF pose estimation using 3D point reconstructions allows for real-time self-localization and registration in large-scale environments. The power of our approach is demonstrated in several experimental evaluations.


Towards Wide Area Localization on Mobile Phones

Authors: Clemens Arth, Daniel Wagner, Manfred Klopschitz, Arnold Irschara, Dieter Schmalstieg

Details: International Symposium on Mixed and Augmented Reality (ISMAR), 19-23 Oct. 2009

We present a fast and memory efficient method for localizing a mobile user's 6DOF pose from a single camera image. Our approach registers a view with respect to a sparse 3D point reconstruction. The 3D point dataset is partioned into pieces based on visibility constraints and occlusion culling, making it scalable and efficient to handle. Starting with a coarse guess, our system only considers features that can be seen from the user's position. Our method is resource efficient, usually requiring only a few megabytes of memory, thereby making it feasible to run on low-end devices such as mobile phones. At the same time it is fast enough to give instant results on this device class.


Exploiting Sensors on Mobile Phones to Improve Wide-Area Localization

Authors: Clemens Arth, Alessandro Mulloni, Dieter Schmalstieg

Details: International Conference on Pattern Recognition (ICPR), 11-15 Nov. 2012

In this paper, we discuss how the sensors available in modern smartphones can improve 6-degree-of-freedom (6DOF) localization in wide-area environments. In our research, we focus on phones as a platform for largescale Augmented Reality (AR) applications. Thus, our aim is to estimate the position and orientation of the device accurately and fast - it is unrealistic to assume that users are willing to wait tenths of seconds before they can interact with the application. We propose supplementing vision methods with sensor readings from the compass and accelerometer available in most modern smartphones. We evaluate this approach on a largescale reconstruction of the city center of Graz, Austria. Our results show that our approach improves both accuracy and localization time, in comparison to an existing localization approach based solely on vision. We finally conclude our paper with a real-world validation of the approach on an iPhone 4S.


Full 6DOF Pose Estimation from Geo-Located Images

Authors: Clemens Arth, Gerhard Reitmayr, Dieter Schmalstieg

Details: Asian Conference on Computer Vision (ACCV), 5-9 Nov. 2012

Estimating the external calibration - the pose - of a camera with respect to its environment is a fundamental task in Computer Vision (CV). In this paper, we propose a novel method for estimating the unknown 6DOF pose of a camera with known intrinsic parameters from epipolar geometry only. For a set of geo-located reference images, we assume the camera position - but not the orientation - to be known. We estimate epipolar geometry between the image of the query camera and the individual reference images using image features. Epipolar geometry inherently contains information about the relative positioning of the query camera with respect to each of the reference cameras, giving rise to a set of relative pose estimates. Combining the set of pose estimates and the positions of the reference cameras in a robust manner allows us to estimate a full 6DOF pose for the query camera. We evaluate our algorithm on different datasets of real imagery in indoor and outdoor environments. Since our pose estimation method does not rely on an explicit reconstruction of the scene, our approach exposes several significant advantages over existing algorithms from the area of pose estimation.


website maintained by Tobias Langlotz
last updated on 2011-08-02

copyright (c) 2011 Graz University of Technology