[MMSEM-UC] review of Semantic Media Analysis for Intelligent Media Retrieval from Suzanne Little on 2007-03-14 (public-xg-mmsem@w3.org from March 2007)

From: Suzanne Little <Suzanne.Little@isti.cnr.it>
Date: Wed, 14 Mar 2007 20:08:30 +0100
To: public-xg-mmsem@w3.org
Message-id: <45F8482E.8050203@isti.cnr.it>
Hi all,

Sorry for the delay. Below are my comments on the Semantic Media 
Analysis for Intelligent Media Retrieval use case as per action 16 from 
the last telecon. Talk to you tomorrow.

Regards,
Suzanne
---------------------------------------------------------
Semantic Media Analysis for Intelligent Media Retrieval
    (using version last edited 2007-03-01 14:06:26 by SofiaTsekeridou)

This use case was a little difficult to review since the problem it aims 
to address is very large and critically important in the field of 
multimedia semantics. I've tried to summarise what I think are the main 
points, looked at possible connections with other use cases and then 
made some general comments.

Disclaimer (if that's the right term!): Significant parts of my thesis 
and of my current post-doc project are concerned with relating low-level 
features to high-level semantics defined in ontologies (e.g. see [1]) so 
that probably colours how I interpret some of this use case.


*Summary*
Intelligent media retrieval means that users are shielded from needing 
in-depth understanding of low-level descriptors and can instead query 
for media content based on high-level meaningful concepts. To support 
this type of query, semantic indices and annotations are needed. This 
use case aims to identify the problems involved with automatically 
creating these "semantic indices".

The use case looks at two main issues or perhaps two approaches for 
finding high-level semantics:
 1. turning low-level descriptors into exchangeable high-level semantics 
(the "multimedia semantic gap") and
 2. exploiting cross-modal data to infer higher-level semantics.


*The Multimedia Semantic Gap*
The 3rd paragraph summarises the standard multimedia issue of the gap 
beween automatic, objective, low-level descriptors and manual, 
subjective, high-level descriptors and the benefits and disadvantages of 
each. There's quite a lot of work and many different approaches in this 
field. You mention Naphade (in another context) but also Arnold 
Smeulders et al.; William Grosky and Rong Zhao; Oge Marques; Jane Hunter 
and myself; Guus Schreiber and Laura Hollink etc. to name just a small 
set of the people I'm aware of. I'm not sure what the procedure is for 
referencing or reviewing related work like this in W3C documents.


*MPEG-7*
The 4th paragraph, which discusses MPEG-7, talks about how to integrate, 
align or link the MPEG-7 descriptors with domain semantics. This is a 
good overlap with the MPEG-7 deliverable. Perhaps (eventually) it should 
reference the discussion in that document. I think perhaps the salient 
point here is how "MPEG-7 metadata descriptions [can be] *properly* 
linked to domain-specific ontologies" (emphasis mine). I'm interested to 
see how people define 'properly' and what solutions are suggested for 
identifying and describing the links.


*Connections with the Algorithm Use Case*
There are obvious connections with the Algorithm Use case. Particularly 
the list of required attributes for the classifier ontology listed in 
the proposed solution for example 1 and as support for the creation of 
"feature vectors" mentioned in example 2. The ontologies in each case 
would need to be related.

At this stage I don't think it would be useful for the two use cases to 
be formally combined. It's more important to note that there is a 
connection and to consider how they effect each other. I view the 
algorithm ontology as being a supporting layer below the constructs 
being discussed in this use case. Since both these use cases deal with 
fundamental underlying semantics and are essentially domain independent 
you could argue for them to be combined with almost any of the others.


*Overall*
The problem is a significant one and highly relevant to this group. It 
underpins many of the other problems that we're trying to address. The 
motivating examples are good; sufficiently realistic and challenging.

The possible solutions list a number of ontologies (classifier, 
multimedia core, visual/audio/textual descriptor, domain, upper 
multimedia, cross-modality) which should be related or linked together 
to address the two issues of this use case. This is a fairly complex mix 
of semantics. The specifics about where concepts (e.g. creator, capture 
device, environment, feature vector etc.) are placed and how the 
ontologies are aligned are important to consider.

The solution discussed for example 1 (generating feature vectors and 
associating them to an object class) is a challenging task. What do you 
think the limitations are in the current technologies (e.g. semantic web 
framework) for supporting this task?

Some questions:
 * what do the links or mappings (low-level to domain) look like and how 
are they created?
 * how are spatial (or temporal) relationships defined?
 * how are media segments or regions identified?
 * what is meant by low-level, medium-level and high-level semantics and 
how do these terms apply to the ontologies?
 * some more details about what concepts belong in each ontology and 
where the connections between them are may also be useful for discussion.


[1] S. Little and J. Hunter "Rules-By-Example - a Novel Approach to 
Semantic Indexing and Querying of Images" ISWC2004 
http://www.springerlink.com/index/2EQ603G8CCXF3E8Y.pdf
Received on Wednesday, 14 March 2007 19:09:53 UTC