W3C home > Mailing lists > Public > public-xg-mmsem@w3.org > March 2007

Re: [MMSEM-UC] review of Semantic Media Analysis for Intelligent Media Retrieval

From: <tzouvaras@image.ece.ntua.gr>
Date: Fri, 16 Mar 2007 21:35:25 -0000
Message-Id: <200703162135.l2GLZPoD011987@manolito.image.ece.ntua.gr>
To: Suzanne Little <Suzanne.Little@isti.cnr.it>, public-xg-mmsem@w3.org

Hi all,

I studied carefully the use case and the comments of Suzzane and below are my
comments:


-----------------------------

I enjoyed reading this deliverable because it has good flow and provides all
the necessary sections. It has a good introduction and the motivating examples
(mostly the first one) are quite good.

I more or less agree with SuzanneĘs comments that this use case is addressing
a very large problem in the area of Knowledge-assisted multimedia analysis. It
is really difficult to cover this area (even part of it) within a few pages. I
would suggest limiting this document in one use case and not presenting two
(visual descriptors and multi-modality). The reason is that in this document
there are many things unclear and vague like the &#8220;medium-level semantics&#8221;,
&#8220;optimizations of the underlying artificial intelligence algorithms&#8221;,
&#8220;decision fusion&#8221;, &#8220;combined semantics&#8221; and a few others. These are terms that
even the people that are dealing with this area can easily interpret them
differently. 

A second comment is that the so called &#8220;semantics extraction&#8221; process using
classification algorithms is not a knowledge assisted analysis process because
the classification algorithms do not take into account neither media semantics
nor domain semantics. Also, such algorithms do not extract any semantcs. They
produce non-machine-understandable predicates-objects. The semantics are
assigned later to these objects through an interpretation function (in the
case of set theoretic semantics.)

My most important comment though is that this use case do not address
adequately the issue of semantic interoperability. Especially, the second use
case moslty addresses how we can handle and fuse knowledge from multiple
modalities. This is not an interoperability problem but a fusion problem. The
first use case better addresses the interop issue by saying that the semantics
of the visual descriptors must be defined in such a way that can be exchanged
to other applications.

The same holds for the possible solutions. The possible solution for the first
usecase better presents how we can ensure semantic interoperability using core
multimedia, visual and domain ontologies. I have a small comment for the use
of CIDOC-CRM as core multimedia ontology. CIDOC-CRM doesnĘt define any
abstract multimedia terms but it defines museum-related terms. The possible
solution for the second use case presents a way to fuse knowledge from
different modalities using a modality ontology. Again, this is a solution for
fusion and not for semantic interoperability.

To summarise, I suggest to limit the document in the first use case and try to
explain in much more detail the issue of semantic interoperability. Include
all the necessary references as Suzanne stated and provide a more concrete
possible solution by presenting how you can make this links from the low-level
to the medium-level semantics and what is the meaning of these terms (medium,
low, high).

Regards,
Vassilis  


Suzanne Little <Suzanne.Little@isti.cnr.it> said:

> 
> Hi all,
> 
> Sorry for the delay. Below are my comments on the Semantic Media 
> Analysis for Intelligent Media Retrieval use case as per action 16 from 
> the last telecon. Talk to you tomorrow.
> 
> Regards,
> Suzanne
> ---------------------------------------------------------
> Semantic Media Analysis for Intelligent Media Retrieval
>     (using version last edited 2007-03-01 14:06:26 by SofiaTsekeridou)
> 
> This use case was a little difficult to review since the problem it aims 
> to address is very large and critically important in the field of 
> multimedia semantics. I've tried to summarise what I think are the main 
> points, looked at possible connections with other use cases and then 
> made some general comments.
> 
> Disclaimer (if that's the right term!): Significant parts of my thesis 
> and of my current post-doc project are concerned with relating low-level 
> features to high-level semantics defined in ontologies (e.g. see [1]) so 
> that probably colours how I interpret some of this use case.
> 
> 
> *Summary*
> Intelligent media retrieval means that users are shielded from needing 
> in-depth understanding of low-level descriptors and can instead query 
> for media content based on high-level meaningful concepts. To support 
> this type of query, semantic indices and annotations are needed. This 
> use case aims to identify the problems involved with automatically 
> creating these "semantic indices".
> 
> The use case looks at two main issues or perhaps two approaches for 
> finding high-level semantics:
>  1. turning low-level descriptors into exchangeable high-level semantics 
> (the "multimedia semantic gap") and
>  2. exploiting cross-modal data to infer higher-level semantics.
> 
> 
> *The Multimedia Semantic Gap*
> The 3rd paragraph summarises the standard multimedia issue of the gap 
> beween automatic, objective, low-level descriptors and manual, 
> subjective, high-level descriptors and the benefits and disadvantages of 
> each. There's quite a lot of work and many different approaches in this 
> field. You mention Naphade (in another context) but also Arnold 
> Smeulders et al.; William Grosky and Rong Zhao; Oge Marques; Jane Hunter 
> and myself; Guus Schreiber and Laura Hollink etc. to name just a small 
> set of the people I'm aware of. I'm not sure what the procedure is for 
> referencing or reviewing related work like this in W3C documents.
> 
> 
> *MPEG-7*
> The 4th paragraph, which discusses MPEG-7, talks about how to integrate, 
> align or link the MPEG-7 descriptors with domain semantics. This is a 
> good overlap with the MPEG-7 deliverable. Perhaps (eventually) it should 
> reference the discussion in that document. I think perhaps the salient 
> point here is how "MPEG-7 metadata descriptions [can be] *properly* 
> linked to domain-specific ontologies" (emphasis mine). I'm interested to 
> see how people define 'properly' and what solutions are suggested for 
> identifying and describing the links.
> 
> 
> *Connections with the Algorithm Use Case*
> There are obvious connections with the Algorithm Use case. Particularly 
> the list of required attributes for the classifier ontology listed in 
> the proposed solution for example 1 and as support for the creation of 
> "feature vectors" mentioned in example 2. The ontologies in each case 
> would need to be related.
> 
> At this stage I don't think it would be useful for the two use cases to 
> be formally combined. It's more important to note that there is a 
> connection and to consider how they effect each other. I view the 
> algorithm ontology as being a supporting layer below the constructs 
> being discussed in this use case. Since both these use cases deal with 
> fundamental underlying semantics and are essentially domain independent 
> you could argue for them to be combined with almost any of the others.
> 
> 
> *Overall*
> The problem is a significant one and highly relevant to this group. It 
> underpins many of the other problems that we're trying to address. The 
> motivating examples are good; sufficiently realistic and challenging.
> 
> The possible solutions list a number of ontologies (classifier, 
> multimedia core, visual/audio/textual descriptor, domain, upper 
> multimedia, cross-modality) which should be related or linked together 
> to address the two issues of this use case. This is a fairly complex mix 
> of semantics. The specifics about where concepts (e.g. creator, capture 
> device, environment, feature vector etc.) are placed and how the 
> ontologies are aligned are important to consider.
> 
> The solution discussed for example 1 (generating feature vectors and 
> associating them to an object class) is a challenging task. What do you 
> think the limitations are in the current technologies (e.g. semantic web 
> framework) for supporting this task?
> 
> Some questions:
>  * what do the links or mappings (low-level to domain) look like and how 
> are they created?
>  * how are spatial (or temporal) relationships defined?
>  * how are media segments or regions identified?
>  * what is meant by low-level, medium-level and high-level semantics and 
> how do these terms apply to the ontologies?
>  * some more details about what concepts belong in each ontology and 
> where the connections between them are may also be useful for discussion.
> 
> 
> [1] S. Little and J. Hunter "Rules-By-Example - a Novel Approach to 
> Semantic Indexing and Querying of Images" ISWC2004 
> http://www.springerlink.com/index/2EQ603G8CCXF3E8Y.pdf
> 
> 



-- 
Received on Friday, 16 March 2007 23:28:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:21 GMT