- From: Suzanne Little <Suzanne.Little@isti.cnr.it>
- Date: Wed, 14 Mar 2007 20:08:30 +0100
- To: public-xg-mmsem@w3.org
Hi all,
Sorry for the delay. Below are my comments on the Semantic Media
Analysis for Intelligent Media Retrieval use case as per action 16 from
the last telecon. Talk to you tomorrow.
Regards,
Suzanne
---------------------------------------------------------
Semantic Media Analysis for Intelligent Media Retrieval
(using version last edited 2007-03-01 14:06:26 by SofiaTsekeridou)
This use case was a little difficult to review since the problem it aims
to address is very large and critically important in the field of
multimedia semantics. I've tried to summarise what I think are the main
points, looked at possible connections with other use cases and then
made some general comments.
Disclaimer (if that's the right term!): Significant parts of my thesis
and of my current post-doc project are concerned with relating low-level
features to high-level semantics defined in ontologies (e.g. see [1]) so
that probably colours how I interpret some of this use case.
*Summary*
Intelligent media retrieval means that users are shielded from needing
in-depth understanding of low-level descriptors and can instead query
for media content based on high-level meaningful concepts. To support
this type of query, semantic indices and annotations are needed. This
use case aims to identify the problems involved with automatically
creating these "semantic indices".
The use case looks at two main issues or perhaps two approaches for
finding high-level semantics:
1. turning low-level descriptors into exchangeable high-level semantics
(the "multimedia semantic gap") and
2. exploiting cross-modal data to infer higher-level semantics.
*The Multimedia Semantic Gap*
The 3rd paragraph summarises the standard multimedia issue of the gap
beween automatic, objective, low-level descriptors and manual,
subjective, high-level descriptors and the benefits and disadvantages of
each. There's quite a lot of work and many different approaches in this
field. You mention Naphade (in another context) but also Arnold
Smeulders et al.; William Grosky and Rong Zhao; Oge Marques; Jane Hunter
and myself; Guus Schreiber and Laura Hollink etc. to name just a small
set of the people I'm aware of. I'm not sure what the procedure is for
referencing or reviewing related work like this in W3C documents.
*MPEG-7*
The 4th paragraph, which discusses MPEG-7, talks about how to integrate,
align or link the MPEG-7 descriptors with domain semantics. This is a
good overlap with the MPEG-7 deliverable. Perhaps (eventually) it should
reference the discussion in that document. I think perhaps the salient
point here is how "MPEG-7 metadata descriptions [can be] *properly*
linked to domain-specific ontologies" (emphasis mine). I'm interested to
see how people define 'properly' and what solutions are suggested for
identifying and describing the links.
*Connections with the Algorithm Use Case*
There are obvious connections with the Algorithm Use case. Particularly
the list of required attributes for the classifier ontology listed in
the proposed solution for example 1 and as support for the creation of
"feature vectors" mentioned in example 2. The ontologies in each case
would need to be related.
At this stage I don't think it would be useful for the two use cases to
be formally combined. It's more important to note that there is a
connection and to consider how they effect each other. I view the
algorithm ontology as being a supporting layer below the constructs
being discussed in this use case. Since both these use cases deal with
fundamental underlying semantics and are essentially domain independent
you could argue for them to be combined with almost any of the others.
*Overall*
The problem is a significant one and highly relevant to this group. It
underpins many of the other problems that we're trying to address. The
motivating examples are good; sufficiently realistic and challenging.
The possible solutions list a number of ontologies (classifier,
multimedia core, visual/audio/textual descriptor, domain, upper
multimedia, cross-modality) which should be related or linked together
to address the two issues of this use case. This is a fairly complex mix
of semantics. The specifics about where concepts (e.g. creator, capture
device, environment, feature vector etc.) are placed and how the
ontologies are aligned are important to consider.
The solution discussed for example 1 (generating feature vectors and
associating them to an object class) is a challenging task. What do you
think the limitations are in the current technologies (e.g. semantic web
framework) for supporting this task?
Some questions:
* what do the links or mappings (low-level to domain) look like and how
are they created?
* how are spatial (or temporal) relationships defined?
* how are media segments or regions identified?
* what is meant by low-level, medium-level and high-level semantics and
how do these terms apply to the ontologies?
* some more details about what concepts belong in each ontology and
where the connections between them are may also be useful for discussion.
[1] S. Little and J. Hunter "Rules-By-Example - a Novel Approach to
Semantic Indexing and Querying of Images" ISWC2004
http://www.springerlink.com/index/2EQ603G8CCXF3E8Y.pdf
Received on Wednesday, 14 March 2007 19:09:53 UTC