The main objective of the MOBVIS project is to explore and exploit the concept of attentive interfaces for the design and implementation of mobile vision technology in urban scenarios. The claim is to make mobile image understanding possible by intelligent interaction between three major functional components, i.e., advanced computer vision for object awareness, exploitation of multi-modal context, and intelligent access to map knowledge, where each can boost performance of the individual components and the system as a whole. That involves making attentive interfaces a first-class concept for mobile situated intelligence, involving fundamental research on AI enabled computer vision methodology, which will be a decisive step forwards in mobile perceptual presence.
By interfacing these components under the decision making of the attention control interface, we will provide a new way for mobile multi-modal context awareness to connect with advanced computer vision. We will show the potential of this new technology by going beyond location based services with simple signal and co-ordinate based relations, towards visually interpreting the world to emerge object awareness for the benefit of future enrichment of mobile services on multi-modal interfaces.
The key concept of the attentive interface includes the interactions between its key components: Object and event hypotheses are generated from vision and multi-modal sensing, and feed into a compact description of a current context state. The context is used to index into the geo-information of the intelligent map component, to retrieve related relevant features, objects, and procedural information. In reverse direction, this information is then applied by the attentive interface to refine the context descriptions, and, finally and most important, to cut down hypotheses in visual interpretation.
In detail, we aim at developing the following knowledge and technology related to attentive interfaces:
Object and Context Awareness
Awareness of semantic information, i.e., objects, activity, and events, is starting as well as end point of information flow in attentive interfaces. First, vision and multi-modal based sensor readings underlie initial hypotheses on items and events in the real world. In MOBVIS, we investigate object awareness from challenging fundamental research on advanced computer vision capabilities, such as, learning informative and robust visual features in outdoors recognition, and 3D information recovery from mobile sensors. In addition, we will detect and recognise objects of high interest in urban scenarios, such as, buildings, infrastructure, people, and faces. MOBVIS will demonstrate research results on map indexing from vision and multi-modal context priming, but also on map aiding functionality to demonstrate how retrieved geo-information features improve performance of the vision task, such as, mobile object recognition.
The innovative concept of ‘intelligent maps’ proposes to augment standard digital city maps with visual, context, and strategic procedural information to aid mobile vision and advanced context awareness tasks. We will investigate, demonstrate and visualise the interaction of context with geo-services that would deliver appropriate augmented map information on demand. Incremental updating of the geo-information will (i) contribute to refined publicly available intelligent maps and (ii) enable personalised maps with high potential for individually specific assistance from the mobile device. The MOBVIS demonstrator will demonstrate the functioning of the intelligent map in the targeted urban scenarios.
The application of context awareness to computer vision is the only possible strategy to cope with the otherwise intractable complexity of visual interpretation. To achieve this challenging research goal, MOBVIS claims the innovative integration of the functional components on object and context awareness, and the intelligent map into a core system that will be capable to execute attentive control, i.e., a functional module on reasoning and decision making that applies a current context state to cut down and refine object hypotheses, index into geo-information, and aid vision based hypothesis verification in turn. Note that the attentive interface is composed of (i) its user related part, the Attentive User Interface, and (ii) its machine related counterpart, the Attentive Machine Interface, an innovative concept with the system’s goal to perform autonomous selection of incoming information, and appropriate exploitation of context states for improved problem solving. The application of the attentive interface technology is mandatory for the realisation of mobile vision services, and will cause fundamental impact on the design of computer vision systems in general.
The resulting attentive interface will be demonstrated in integrated form on a multi-sensor handheld device that will be capable of recognising objects of interest and places in an urban environment, to perform visual positioning and semantic annotation on images and videos, depending in interpretation and performance on a current context. The image understanding task will be primed by multi-modal context descriptions, and map aiding capabilities, such as, to verify hypothesised visual features in the field of view. Several use cases will be evaluated with respect to two archetypical mobile vision scenarios: (i) the ‘visitor’ scenario involving benchmarking positioning and navigation tasks, and (ii) the ‘personal diary’ scenario, including performance evaluation on vision and multi-modal situation and context analysis.