[MMSEM-UC] Semantic Media Retrieval UC: comments from Raphaël Troncy on 2006-10-26 (public-xg-mmsem@w3.org from October 2006)

From: Raphaël Troncy <Raphael.Troncy@cwi.nl>
Date: Thu, 26 Oct 2006 10:19:25 +0200
To: Ioannis Pratikakis <ipratika@iit.demokritos.gr>
CC: public-xg-mmsem@w3.org
Message-ID: <45406F8D.FFEF8AC5@cwi.nl>

Dear Ioannis,

Thanks for having clarified your ideas and renamed this use case into
"Semantic Media Retrieval" (I like it also better than the previous
one). So find below my comments on your draft.

> My effort focuses on highlighting the required knowledge
> representation and subsequent semantic interoperabilty.
> I would be very grateful in your frutiful feedback and more
> particularly to your active participation that may enhance not only
> the "motivating examples" but also aid towards underlying interesting
> corresponding "possible solutions".
>

I like generally the problem you would like to highlight, that I would
rephrase as:
    - Retrieving multimedia material needs in many cases having large
sets of annotations, obtained automatically from the material.
    - These annotations can be "more" semantics if cross-modality
analysis and knowledge inferences are performed.
    - Doing cross-modality analysis requires some interoperability
between the results and their representation of each single modality
analysis.
Therefore, I think that this use case perfectly fits with the objectives
we pursue in this group.
In more details:
    1) I found your examples, not enough "example" ! You still keep a
very general and sometimes "vague" level of discourse. To help you, for
instance in the Example 3, give us a web page with a picture and its
caption. Tell us what kind of information some text analysis techniques
could give you. Then tell us how your face analysis will use this
information as input to better detect the person of this web page ... In
other words, be concrete :-)
    2) Before your motivating examples, you might first discuss the
problems you would like to tackle. It seems to me that your concerns
are:
        . How to better do cross-modality analysis, and better exchange
the results of each single modality analysis ? see Example 3
        . How to include some fuzziness in the representation of the
analysis results (some degree of confidence), and how to merge this
fuzzy information with the true/false knowledge if an ontology ? see
Example 2
        . How to add semantics to the representation of low-level
descriptors so they become more exchangable ? see Example 1
    3) I don't get exactly what you mean in your Example 1. When you say
that "To enable a semantic interoperability it is not adequate to permit
the exchange of low-level features between different users", would you
mean, it is not wishable ? or do just remark that in the current
situation, MPEG-7 does not allow such an exchange because of its lack of
formal semantics ? And what do you suggest for solving this issue:
provide a formal semantics to these low-level MPEG-7 descriptors ? Or
simply do not exchange this information ? ...
I don't get after the problems with parts of images. Could you clarify
this point ?

Best regards.

    Raphaël

--
Raphaël Troncy
CWI (Centre for Mathematics and Computer Science),
Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
e-mail: raphael.troncy@cwi.nl & raphael.troncy@gmail.com
Tel: +31 (0)20 - 592 4093
Fax: +31 (0)20 - 592 4312
Web: http://www.cwi.nl/~troncy/

Received on Thursday, 26 October 2006 08:21:55 UTC