Re: action 78 - Discussion about interoperability from Felix Sasaki on 2009-01-28 (public-media-annotation@w3.org from January 2009)

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 28 Jan 2009 10:48:18 +0900
To: Tobias Bürger <tobias.buerger@sti2.at>
CC: Pierre-Antoine Champin <pchampin@liris.cnrs.fr>, public-media-annotation@w3.org
Message-ID: <497FB962.6030405@w3.org>
Tobias Bürger wrote:
> Dear all,
>
> as promised in the telecon, here my reply to Felix' mail. See comment 
> below
>
> Felix Sasaki wrote:
>>
>> Pierre-Antoine Champin さんは書きました:
>>> Hi,
>>>
>>> for action 78, I had to write a wiki page about some concerns I raised
>>> during the last telecon about interoperability between mapped
>>> properties. Since this is supposed to be matter for discussion rather
>>> than a formal document, I think it is best to send it as a mail.
>>>
>>>
>>> What triggered my concern was the mapping for Media RSS, between
>>> ''dc:creator'' and ''dcterms:creator''. Just as a reminder, the Dublin
>>> Core vocabulary has two versions: the legacy "elements" (usually
>>> prefixed with ''dc'') and the "terms" (usually prefixed with
>>> ''dcterms''). Each term is more specific than its corresponding 
>>> element,
>>> as its values are more constrained. For example, ''dc:creator'' can 
>>> have
>>> any type of value (including a plain string), while 'dcterms:creator''
>>> must have a URI, which must denote an instance of ''dcterms:Agent''.
>>> If we decide to specify the ontology only as prose
>>> Let us consider the example of ''dc:creator'' with a sample of 
>>> mappings:
>>>
>>> * for XMP, its value is a sequence of strings, each string being the
>>> name of an author.
>>>
>>> * for Media RDF, its value is either
>>>   - a plain string,
>>>   - an instance of ''foaf:Agent'' with at least a ''foaf:name'', or
>>>   - an instance of ''vcard'' with at least a ''fn''.
>>> Since they are using ''dcterms'', it must also be inferred to be a
>>> ''dcterms:Agent'' (which contradicts the use of a plain string...). It
>>> may represent only one ("the primary") creator.
>>>
>>> * for ID3, the value of TOPE is a string, where names are separated 
>>> by "/".
>>>
>>>
>>> My point here is that, beyond the "high level" semantic links 
>>> identified
>>> by the mapping table, there are some "low level" discrepancies that are
>>> both semantic (e.g. representing one or several creators) and syntactic
>>> (slash-separated string or structured sequence).
>>>
>>> Leaving these issues to the implementation will inevitably lead to 
>>> major
>>> differences and a lack of interoperability. We could specify down to 
>>> the
>>> syntactical level the mapping for each property in each format, but 
>>> what
>>> about other formats ?
>>>
>>> I think a better way to limit the variability in implementations by
>>> specifying precisely, for each property of our ontology, the expected
>>> "low level" features of its value (and not only its "high level"
>>> meaning) so that implementors know what they can keep from the original
>>> metadata, and what they need to adapt (i.e. split ID3's TOPE field into
>>> multiple values).
>>>
>>> This has to be done at least at the API level. But I guess this could
>>> also be done to some extent at the ontology level (I do believe that
>>> those "low level" features are *not only* syntactic), but that raises
>>> again the problem of formally specifying the ontology or not.
>>>
>>> But the less specific we are in describing the ontology, the more
>>> precise we will have to be in describing the API, in order to avoid 
>>> "low
>>> level" semantic discrepancies.
>>>   
>>
>> I agree very much with your analysis, Pierre-Antoine. +1 to have a 
>> very low wheight ontology and to be more precise in the API 
>> description. Also I am hoping very much that people will volunteer to 
>> actually test the mappings in toy implementations, no matter if 
>> relying on a complex ontology or a detailed API. No matter which way 
>> we go, let's test them now.
> I guess the intention of the toy implementations should be to get a 
> more deeper understanding which type of mismatches between the 
> properties defined in the different formats might occur. 

No, not at all! The toy implementation or real implementation (whatever 
you can create) is necessary for us to move forward in the W3C process. 
My opinion is that we should work (toy or real) implementation driven: 
use only the properties which are actually implemented in the API. The 
others are dropped. E.g. everything is dropped from the mapping table 
which is not implemented, *before* we go to last call.


> This might include data type mismatches, but also structural 
> mismatches as outlined by Pierre-Antoine above. If you derive them by 
> hard thinking or by prototypical implementations does not matter, as 
> long as we are aware of them, because we finally have to implement 
> them at some point in time.

The crucial part is "some point in time". I have seen several working 
groups which developed very elaborates specs - but when it came to 
implementing them they ran into difficulties and had to revise the 
specs. In terms of W3C that means: going back to a normal working draft 
and loose probably 1/2 year. That is what I want to avoid by asking you 
to work on implementations now.

>
> Regarding the mapping, and more specifically from where we should map: 
> The mapping should be to our core ontology to whose semantics we 
> committed ourselves or will commit. So we will define what we allow as 
> the domain and range of a property.

The mapping does not need to be defined in terms of range and 
properties. Please don't use RDF specific terminology - we have no 
agreement to restrict ourself to this.


>
> And I disagree to the last statement from Pierre-Antoine above: if we 
> describe the ontology less specific than we also do not need to be 
> more precise in the API. It has been my understanding that this group 
> defines an ontology consisting of a set of core properties for the 
> description of media objects on the Web to which all the formats in 
> our scope will be mapped to. Saying that, if you describe the ontology 
> more lightweight, meaning perhaps with less detail or level of 
> specifity, than you also map to something not very specific.

I think we could avoid these kinds of discussions by just starting 
implementing the mappings.


> For me the API is a means to transparently access a description of a 
> media object in a format about which I do not want to care about when 
> accessing the API. So we should define return types? Or should the 
> burden of identying the return type be shifted to the user? (I guess 
> we had this discussion before but did not come to a conclusion....)

We have a requirement for this:
http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/#req-r13

I think the current problem of the group is that there is an unbalance 
between the goal of defining a read-only API, and the participants who 
are mostly interested in an ontology, and also mostly in an RDF-based 
ontology. One solution to this unbalance is to get other people on board 
who are more interested in the API. I hope that this will happen soon.

Felix
Received on Wednesday, 28 January 2009 01:48:57 UTC