Re: [MMSEM-UC] review of Semantic Media Analysis for Intelligent Media Retrieval from Ioannis Pratikakis on 2007-03-28 (public-xg-mmsem@w3.org from March 2007)

From: Ioannis Pratikakis <ipratika@iit.demokritos.gr>
Date: Wed, 28 Mar 2007 21:51:15 +0300
To: <Suzanne.Little@isti.cnr.it>, <tzouvaras@image.ece.ntua.gr>, <public-xg-mmsem@w3.org>
Cc: "Ioannis Pratikakis" <ipratika@iit.demokritos.gr>, "Sofia Tsekeridou" <sots@ait.edu.gr>
Message-ID: <032101c7716a$1016cbf0$84e3e98f@ipratika>
Dear Suzanne and Vassilis,

First of all, we would like to thank you for the fruitful review you made.
In the following, you may find preliminary comments to the major raised issues. 


--------------------------------------------------------------------------------


Reply to  Suzanne Little <Suzanne.Little@isti.cnr.it> comments:


*MPEG-7*
>> The 4th paragraph, which discusses MPEG-7, talks about how to integrate, 
>> align or link the MPEG-7 descriptors with domain semantics. This is a 
>> good overlap with the MPEG-7 deliverable. Perhaps (eventually) it should 
>> reference the discussion in that document. 

We fully agree that there should exist such a link with the MPEG-7 deliverable.


>>I think perhaps the salient 
>> point here is how "MPEG-7 metadata descriptions [can be] *properly* 
>> linked to domain-specific ontologies" (emphasis mine). I'm interested to 
>> see how people define 'properly' and what solutions are suggested for 
>> identifying and describing the links.


*properly* has not a unique interpretation. It depends on the constraints posed in the application domain at hand.


>> *Connections with the Algorithm Use Case*
>> There are obvious connections with the Algorithm Use case. Particularly 
>> the list of required attributes for the classifier ontology listed in 
>> the proposed solution for example 1 and as support for the creation of 
>> "feature vectors" mentioned in example 2. The ontologies in each case 
>> would need to be related.
>> 
>> At this stage I don't think it would be useful for the two use cases to 
>> be formally combined. It's more important to note that there is a 
>> connection and to consider how they effect each other. I view the 
>> algorithm ontology as being a supporting layer below the constructs 
>> being discussed in this use case. Since both these use cases deal with 
>> fundamental underlying semantics and are essentially domain independent 
>> you could argue for them to be combined with almost any of the others.


Agreed.


>> The solution discussed for example 1 (generating feature vectors and 
>> associating them to an object class) is a challenging task. What do you 
>> think the limitations are in the current technologies (e.g. semantic web 
>> framework) for supporting this task?

In this UC, we actually identify the limitations in semantic web technologies, upon which the proposed solution is based.
The proposed solution is an extension of the existing MPEG-7 schema to support classification tasks. More specifically, we focus on the automatic process of generating semantics of a certain granularity (medium and high level) out of low-level features.

>>  * what is meant by low-level, medium-level and high-level semantics and 
>> how do these terms apply to the ontologies?

low-level is not semantics. They are features.
medium-level are semantics extracted from classifiers trained with low-level features (e.g. indoor, outdoor, etc)
High-level semantics are a combination of medium-level semantics based upon relations. They can capture most abstract forms


>>  * how are spatial (or temporal) relationships defined?
>>  * how are media segments or regions identified?

We do not consider any further suggestion than those from MPEG-7 (we assume that you refer to the single modality case).


>>  * what do the links or mappings (low-level to domain) look like and how 
>> are they created?


For medium-level semantics, automatic linking is done via classification
For high-level semantics, automatic linking is done via classification and reasoning


--------------------------------------------------------------------------------


Reply to Vassilis Tzouvaras  <tzouvaras@image.ece.ntua.gr> comments :

> 
> I more or less agree with Suzanne's comments that this use case is addressing
> a very large problem in the area of Knowledge-assisted multimedia analysis. It
> is really difficult to cover this area (even part of it) within a few pages. I
> would suggest limiting this document in one use case and not presenting two
> (visual descriptors and multi-modality). The reason is that in this document
> there are many things unclear and vague like the &#8220;medium-level semantics&#8221;,
> &#8220;optimizations of the underlying artificial intelligence algorithms&#8221;,
> &#8220;decision fusion&#8221;, &#8220;combined semantics&#8221; and a few others. These are terms that
> even the people that are dealing with this area can easily interpret them
> differently. 
> 

We strongly believe that both scenarios are interrelated and the first one is a sub-case of the second.
Presenting both, we try to falicitate understanding and attain granularity exemplars.

We don't think that there is any vaguenes in terms like :
- "optimizations of the underlying artificial intelligence algorithms" : Please, do consider that we are dealing with automated semantics extraction which is a difficult task and greatly depends upon case-based optimisations.
- "decision fusion" : It concerns the combination of different classifier outputs to reach a final optimised decision.
- "medium-level semantics" : they are extracted from classifiers trained with low-level features (e.g. indoor, outdoor, etc)
- "combined semantics" : They are high-level semantics are a combination of medium-level semantics based upon relations. They can capture most abstract forms


> A second comment is that the so called &#8220;semantics extraction&#8221; process using
> classification algorithms is not a knowledge assisted analysis process because
> the classification algorithms do not take into account neither media semantics
> nor domain semantics. Also, such algorithms do not extract any semantcs. They
> produce non-machine-understandable predicates-objects. The semantics are
> assigned later to these objects through an interpretation function (in the
> case of set theoretic semantics.)
> 

Classification algorithms lead to semantics of medium-level. 
At this level, therefore, it is not required any reasoning.


> My most important comment though is that this use case do not address
> adequately the issue of semantic interoperability. Especially, the second use
> case moslty addresses how we can handle and fuse knowledge from multiple
> modalities. This is not an interoperability problem but a fusion problem. The
> first use case better addresses the interop issue by saying that the semantics
> of the visual descriptors must be defined in such a way that can be exchanged
> to other applications.
> 

 
As already mentioned, the second use case is a superset of the first one. Thus, any semantic interoperability issue of the first use case applies for the second use case for each separate modality. Additionally, this case further addresses the "combined semantics" of the cross-modalities and their inter-dependencies. 



> The same holds for the possible solutions. The possible solution for the first
> usecase better presents how we can ensure semantic interoperability using core
> multimedia, visual and domain ontologies. I have a small comment for the use
> of CIDOC-CRM as core multimedia ontology. CIDOC-CRM doesn't define any
> abstract multimedia terms but it defines museum-related terms. The possible
> solution for the second use case presents a way to fuse knowledge from
> different modalities using a modality ontology. Again, this is a solution for
> fusion and not for semantic interoperability.
> 

About CIDOC-CRM we fully agree.

As we have also mentioned in the UC description, we did not have a clear thought about the exact solution whether it will be a cross-modality ontology or single modality media ontologies associated to a domain ontology. A final solution will be given with the final update of the UC.



--------------------------------------------------------------------------------



Thanks again for the frutiful criticism.

with our best regards,

Sofia, 
Ioannis



+======================================================+
Ioannis PRATIKAKIS, Dipl. Eng., Ph.D ( http://iit.demokritos.gr/~ipratika/ )
Research Scientist

Computational Intelligence Laboratory
Institute of Informatics and Telecommunications
National Center for Scientific Research "Demokritos"
P.O. BOX   60228
GR-153 10 Agia Paraskevi, Athens, Greece.
Tel:     +30-210-650 3183
Fax:    +30-210-653 2175
E-mail: ipratika@iit.demokritos.gr
+======================================================+
Received on Wednesday, 28 March 2007 18:51:23 UTC