Sabtu, 12 Maret 2016

Symbol

Symbol Grounding for Semantic Image Interpretation: From Image Data to
Semantics
Celine Hudelot, Nicolas Maillot and Monique Thonnat
INRIA Sophia Antipolis - Orion Team
2004, Route des Lucioles. BP 93.
06902 Sophia Antipolis- France

1 Introduction
The semantic image interpretation problem can be informally
defined as the automatic extraction of the meaning
(semantics) of an image. This problem can be simply illustrated
with the example shown in figure 1.
When we look at the image on the left of figure 1, we
have to answer to the following question: what is the semantic
contents of this image? According to the level of knowledge
of the interpreter, various interpretations are possible:(
1) a white object on a green background; (2) an insect;
or (3) an infection of white flies on a rose leaf. All
these interpretations are correct and enable us to conclude
that semantics is not inside the image. Image interpretation
depends on a priori semantic and contextual knowledge.

The three sub-problems are:
(1) the image processing problem, i.e. the extraction of
numerical image data;
(2) the symbol grounding problem, i.e. the mapping between
the numerical image data and the high level representations
of semantic concepts;
(3) the semantic interpretation problem, i.e. the understanding
of the perceived scene using the application domain
terminology (semantic concepts).

2 Related Work
As already mentioned in the introduction, the semantic
interpretation of a visual scene is highly dependent on prior
knowledge and experience of the viewer. Vision is an intensive
knowledge based process. Many knowledge based
vision systems have been suggested in the past (VISIONS
[11], SIGMA [19], PROGAL [22], MESSIE [23],...).
The analysis of these different knowledge based vision
systems enables us to draw some conclusions. A first characteristic
is the existence, for all these systems, of at least
three different semantic levels: the low level, the intermediate
level and the semantic level. These levels refer to the
abstraction level of the handled data and knowledge. They
reflect the different data transformations useful for image
semantic interpretation as illustrated in figure 1. Nevertheless,
the existence of these different levels does not automatically
imply to deal with the symbol grounding problem
as a problem as such. Indeed, this problem is often encapsulated
in the semantic interpretation problem through different
forms ( for example through domain dependent data
abstraction rules in [22]). Interesting works concerning an
independent intermediate level are the ISR approach [2] of
the VISIONS system [11] and the use of conceptual spaces
in [3]. ISR [2] (Intermediate Symbolic Representation) is
a representation system and a management system for the
use of the intermediate (symbolic) representation. ISR is
based on database management methodology. It is an active
interface between high level inference processes and image
data. ISR provides tools for classification based on features,
perceptual grouping, spatial access (e.g. the detection and
the verification of neighborhood relations between objects)
and constraint based graph matching between graphs of data
and graphs of models. In [3], a symbol grounding approach
based on conceptual spaces [9] is proposed. A conceptual
space is a metric space in which entities are characterized by
a number of quality dimensions (color, spatial coordinates,
size,..). The dimensions of conceptual space represent qualities
of the environment independently of any linguistic formalism
or description. This representation enables the modeling
of natural concepts (real physical objects) as convex
regions in the conceptual space and it enables reasoning as
concept formation, induction and categorization [9].

Overview of our Cognitive Vision Approach
A look on the state on the art in various domains shows
the importance and the complexity of the symbol grounding
problem. We consider the symbol grounding problem as
an independent problem. As in [2] and [3], we propose to
work at an intermediate level called visual level. As shown
in figure 2:
- A visual concept ontology, as a common vocabulary,
enables the communication between the intermediate visual
level and the semantic level. This visual concept ontology
is used to visually describe semantic concepts.
- An image processing ontology enables the communication
between the visual level and the image processing
level.
In this level, the symbol grounding problem consists in
making the link between the symbolic description of the expected
contents of the scene (described using the visual concept
ontology) and the really perceived scene (described using
the image processing ontology). We propose two methods
to build this link:
- A learning approach which leads to a set of visual concept
detectors.
- An a priori knowledge based approach which consists
in making this link explicit in a symbol grounding knowledge
base.

Ontology Based Communication
In a knowledge sharing context, the notion of ontologies
was defined by Gruber in [10] as a “ formal, explicit specification
of a shared conceptualization. An ontology entails
some sort of the world view, i.e. a set of concepts, their definitions
and their relational structure which can be used to
describe and reason about a domain. An ontology is composed
of (1) a set of concepts (C), (2) a set of relations (R)
and (3) a set of axioms. In [26], purposes and benefits of
using ontologies are divided into three categories: they are
an assistance for communication, they enable the interoperability
among computer system modules and they achieve
improvements in software engineering: specification, reliability
and re-usability. In our case, we use ontological engineering
for the communication and the information sharing
between the different data abstraction levels involved in semantic
image interpretation.

5 Symbol Grounding
The main goal of symbol grounding is to perform the
matching between the symbols used to describe semantic
concepts and sensor data. In our case, the symbols are visual
concepts (from the visual concept ontology) and the
sensor data are image data (described using the image processing
ontology). The main difficulty of this matching lies
in the different nature of the two sets of data. The representation
spaces are different for the visual concepts and
for image data and the problem consists in defining correspondence
links between both types of representations. We
present in this section, two approaches to establish these
correspondence links:
- A learning approach: links between low level image
data features and visual concepts are learned from image
samples
- An a priori knowledge based approach : links between
low level image data features and visual concepts are built
explicitly.
+ Definition 4 Let FCi 2 F the set of image feature
concepts that can be associated to the visual concept Ci.
For example, the visual concept Hue can be associated with
the set of image color features.
+ Definition 5 We define V al : F _ E ! Rn so that
V al(FCi ; e) represents the numerical values of the feature
set FCi computed for e.

Tidak ada komentar:

Posting Komentar