Re: action 78 - Discussion about interoperability from Felix Sasaki on 2009-01-28 (public-media-annotation@w3.org from January 2009)

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 28 Jan 2009 21:14:24 +0900
To: Pierre-Antoine Champin <pchampin@liris.cnrs.fr>
CC: public-media-annotation@w3.org
Message-ID: <49804C20.3070700@w3.org>
Pierre-Antoine Champin wrote:
> Felix,
>
> I understand perfectly your concerns about the need to implement soon,
> in order to validate the spec, rather than implement to late and
> invalidate the parts of the spec that turn out to be too hard to implement.
>   

There is another reason: to implement soon, in order to find out which 
existing format needs to be taken into account. Just as an example: If 
nobody implements the mapping from e.g. MPEG-7, we should not even put 
the mapping into a working draft. Also, I think that a good approach is 
"bottom up", from the to be implemented mappings to answers about 
"low-level" and "high-level" semantics, rather than top-down discussion 
on what is feasible, towards actual singular mappings.

> However, I find it quite difficult to start to implement before we have
> agreed a little more on some points. The mapping table captures an
> agreement on "high-level" semantics. We have started to discuss the
> syntax problems, especially in relation with req-r13, but have not
> reached a consensus yet. What I think is still very unclear are the
> "low-level" semantics features.
>   


Agree. I would just prefer to discuss them while looking at - and 
working on - running code ...

> And by the way, "domain" and "range" are not specific to RDF! It is
> true, though, that they imply a formalization. 

Both true. What I should have said below "please do not introduce 
terminology which requires a formalization", since I am not convinced we 
need that.

> However, I guess Tobias
> advocates the point that having a formal ontology would make it easier
> to implement, not more difficult. Of course, that would require from
> implementers that they understand the formal ontology, which could
> hinder acceptance. But on the other hand, I believe that it would reduce
> the risks of having heterogeneous (hence not interoperable) implementations.
>   

I believe it could reduce that risk for whose implementors and users who 
are familiar with these kind of formalization. And I do not disagree to 
have that as one conformance slice, see req.
http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/#req-r11
as long as we have in the center a prose description which is clear 
enough for the non-formal people: remember the metadata working group 
deliverable  ... but we discussed this at the f2f in Ghent, I believed - 
that is the reason why we created req. 11 ...

Felix

>   Pierre-Antoine
>
> Felix Sasaki wrote:
>   
>> Tobias Bürger wrote:
>>     
>>> Dear all,
>>>
>>> as promised in the telecon, here my reply to Felix' mail. See comment
>>> below
>>>
>>> Felix Sasaki wrote:
>>>       
>>>> Pierre-Antoine Champin さんは書きました:
>>>>         
>>>>> Hi,
>>>>>
>>>>> for action 78, I had to write a wiki page about some concerns I raised
>>>>> during the last telecon about interoperability between mapped
>>>>> properties. Since this is supposed to be matter for discussion rather
>>>>> than a formal document, I think it is best to send it as a mail.
>>>>>
>>>>>
>>>>> What triggered my concern was the mapping for Media RSS, between
>>>>> ''dc:creator'' and ''dcterms:creator''. Just as a reminder, the Dublin
>>>>> Core vocabulary has two versions: the legacy "elements" (usually
>>>>> prefixed with ''dc'') and the "terms" (usually prefixed with
>>>>> ''dcterms''). Each term is more specific than its corresponding
>>>>> element,
>>>>> as its values are more constrained. For example, ''dc:creator'' can
>>>>> have
>>>>> any type of value (including a plain string), while 'dcterms:creator''
>>>>> must have a URI, which must denote an instance of ''dcterms:Agent''.
>>>>> If we decide to specify the ontology only as prose
>>>>> Let us consider the example of ''dc:creator'' with a sample of
>>>>> mappings:
>>>>>
>>>>> * for XMP, its value is a sequence of strings, each string being the
>>>>> name of an author.
>>>>>
>>>>> * for Media RDF, its value is either
>>>>>   - a plain string,
>>>>>   - an instance of ''foaf:Agent'' with at least a ''foaf:name'', or
>>>>>   - an instance of ''vcard'' with at least a ''fn''.
>>>>> Since they are using ''dcterms'', it must also be inferred to be a
>>>>> ''dcterms:Agent'' (which contradicts the use of a plain string...). It
>>>>> may represent only one ("the primary") creator.
>>>>>
>>>>> * for ID3, the value of TOPE is a string, where names are separated
>>>>> by "/".
>>>>>
>>>>>
>>>>> My point here is that, beyond the "high level" semantic links
>>>>> identified
>>>>> by the mapping table, there are some "low level" discrepancies that are
>>>>> both semantic (e.g. representing one or several creators) and syntactic
>>>>> (slash-separated string or structured sequence).
>>>>>
>>>>> Leaving these issues to the implementation will inevitably lead to
>>>>> major
>>>>> differences and a lack of interoperability. We could specify down to
>>>>> the
>>>>> syntactical level the mapping for each property in each format, but
>>>>> what
>>>>> about other formats ?
>>>>>
>>>>> I think a better way to limit the variability in implementations by
>>>>> specifying precisely, for each property of our ontology, the expected
>>>>> "low level" features of its value (and not only its "high level"
>>>>> meaning) so that implementors know what they can keep from the original
>>>>> metadata, and what they need to adapt (i.e. split ID3's TOPE field into
>>>>> multiple values).
>>>>>
>>>>> This has to be done at least at the API level. But I guess this could
>>>>> also be done to some extent at the ontology level (I do believe that
>>>>> those "low level" features are *not only* syntactic), but that raises
>>>>> again the problem of formally specifying the ontology or not.
>>>>>
>>>>> But the less specific we are in describing the ontology, the more
>>>>> precise we will have to be in describing the API, in order to avoid
>>>>> "low
>>>>> level" semantic discrepancies.
>>>>>   
>>>>>           
>>>> I agree very much with your analysis, Pierre-Antoine. +1 to have a
>>>> very low wheight ontology and to be more precise in the API
>>>> description. Also I am hoping very much that people will volunteer to
>>>> actually test the mappings in toy implementations, no matter if
>>>> relying on a complex ontology or a detailed API. No matter which way
>>>> we go, let's test them now.
>>>>         
>>> I guess the intention of the toy implementations should be to get a
>>> more deeper understanding which type of mismatches between the
>>> properties defined in the different formats might occur. 
>>>       
>> No, not at all! The toy implementation or real implementation (whatever
>> you can create) is necessary for us to move forward in the W3C process.
>> My opinion is that we should work (toy or real) implementation driven:
>> use only the properties which are actually implemented in the API. The
>> others are dropped. E.g. everything is dropped from the mapping table
>> which is not implemented, *before* we go to last call.
>>
>>
>>     
>>> This might include data type mismatches, but also structural
>>> mismatches as outlined by Pierre-Antoine above. If you derive them by
>>> hard thinking or by prototypical implementations does not matter, as
>>> long as we are aware of them, because we finally have to implement
>>> them at some point in time.
>>>       
>> The crucial part is "some point in time". I have seen several working
>> groups which developed very elaborates specs - but when it came to
>> implementing them they ran into difficulties and had to revise the
>> specs. In terms of W3C that means: going back to a normal working draft
>> and loose probably 1/2 year. That is what I want to avoid by asking you
>> to work on implementations now.
>>
>>     
>>> Regarding the mapping, and more specifically from where we should map:
>>> The mapping should be to our core ontology to whose semantics we
>>> committed ourselves or will commit. So we will define what we allow as
>>> the domain and range of a property.
>>>       
>> The mapping does not need to be defined in terms of range and
>> properties. Please don't use RDF specific terminology - we have no
>> agreement to restrict ourself to this.
>>
>>
>>     
>>> And I disagree to the last statement from Pierre-Antoine above: if we
>>> describe the ontology less specific than we also do not need to be
>>> more precise in the API. It has been my understanding that this group
>>> defines an ontology consisting of a set of core properties for the
>>> description of media objects on the Web to which all the formats in
>>> our scope will be mapped to. Saying that, if you describe the ontology
>>> more lightweight, meaning perhaps with less detail or level of
>>> specifity, than you also map to something not very specific.
>>>       
>> I think we could avoid these kinds of discussions by just starting
>> implementing the mappings.
>>
>>
>>     
>>> For me the API is a means to transparently access a description of a
>>> media object in a format about which I do not want to care about when
>>> accessing the API. So we should define return types? Or should the
>>> burden of identying the return type be shifted to the user? (I guess
>>> we had this discussion before but did not come to a conclusion....)
>>>       
>> We have a requirement for this:
>> http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/#req-r13
>>
>> I think the current problem of the group is that there is an unbalance
>> between the goal of defining a read-only API, and the participants who
>> are mostly interested in an ontology, and also mostly in an RDF-based
>> ontology. One solution to this unbalance is to get other people on board
>> who are more interested in the API. I hope that this will happen soon.
>>
>> Felix
>>     
>
>
Received on Wednesday, 28 January 2009 12:16:32 UTC