Re: action 78 - Discussion about interoperability from Pierre-Antoine Champin on 2009-02-03 (public-media-annotation@w3.org from February 2009)

From: Pierre-Antoine Champin <pchampin@liris.cnrs.fr>
Date: Tue, 03 Feb 2009 11:38:11 +0000
To: Felix Sasaki <fsasaki@w3.org>
CC: public-media-annotation@w3.org
Message-ID: <49882CA3.9040904@liris.cnrs.fr>
Felix,

I agree with you that low-level semantics and syntax could be subsumed
by data types, although I think the distinction may be useful.

Your example with dates is merely syntaxic: indeed, we *must* decide
whether dates should be represented as "2002-01-01" or "Jan. 1st 2002",
*independantly* of how they are internally represented in the underlying
API.

I think the problem with other metadata (like creator or related
resource) is more complicated because there is no "standard/common" data
type for representing a person or a media object (leaving aside the
problems of multi- vs. mono- valued properties)... And that existing
formats have quite different views on it (for creator: single name, name
list, URI...).

So I am all in for a bottom-up approach, but I agree whith Joachim when
he write
>> If we at least agree on how to specify the "content" of the return
>> value, then perhaps we can start to work bottom-up wise!

  pa


Felix Sasaki a écrit :
> 
> Joakim Söderberg さんは書きました:
>> Hello,
>> From my horizon I can see that we agree on the following:
>>
>> 1) We have different types of mismatches; between the properties, data
>> types and structure that can be solved by different parts of our
>> standard.
>> We use the following terms to refer to them:
>>
>> * High-level semantics     = (ontology) Semantic links identified by
>> the mapping table.            
> 
> If we realize the requirement
> http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/#req-r05
> there would be no semantics links. The ontology would be just a list of
> terms. So there would be no semantic links, but not more as a prose
> description of mappings like in
> http://dev.w3.org/2008/video/mediaann/mediaont-api-1.0/mediaont-api-1.0.html#property-createDate
> 
> 
>>      
>> * Low-level semantics    = (content of the return types) Structure,
>> e.g. representing one or several creators.
>> * Syntax             = (API) return types, e.g. slash-separated string
>> or                       structured sequence
>>   
> 
> for me, there is not necessarily a difference to separate between
> "low-level semantics" and "syntax". You can subsume both under data
> types, see e.g.
> http://www.w3.org/TR/REC-xml/#sec-attribute-types
> where you have e.g. IDREFS vs. IDREF to differentiate one IDs versus
> several ones.
> Of course it is possible to differentiate between the two, but somebody
> who uses our API needs to know "what do I get back from a method" - and
> "what" here subsumes both.
>>
>> 2) The goal of this group is to define an ontology consisting of a set
>> of core properties for the description of media objects on the Web to
>> which all the formats in our scope will be mapped to.
>>   
> 
> Sorry, again disagree at least slightly. My version would be:
> 
> 2) The goal of this group is to define an ontology consisting of a set
> of core properties for the description of media objects on the Web to
> which all the formats which are implemented in the API will be mapped to.
> 
> 
> As I said before:
> "who is planning to implement a mapping for an existing format? If
> nobody volunteers for some properties, we drop that part of the format -
> or even the complete format."
> Or in other words: what becomes part of the ontology depends on
> volunteers for implementing the relevant parts of the API.
> 
>> The API is a means to transparently access a description of a media
>> object in a format which the user do not need to be knowledgeable of
>> when accessing the API.
>>
>> 3) Some proposals and important questions are:
>> - Should we define return types?   
> 
> I think this is mandatory. If our api returns for things like
> getCreateDate values like
> 2002-01-01
> Jan. 1st 2002
> We put a high burden on implementers to clean this up. IMO the cleaning
> up is our job.
> 
>> - Can we use "domain" and "range" to structure the mismatches?
>>   
> 
> IMO we can use whatever terminology we want in the ontology, as long as
> people see the API not just as a "spin off" of the more important
> ontology work. Please remember also that our test suite is also likely
> to be consistening of media files , to be used as an input for our read
> only metadata access API. How do you want to test the interoperability
> between API implementations if the do not need to agree on 2002-01-01
> vs. "Jan. 1st 2002"?
> 
> 
>>
>> If we at least agree on how to specify the "content" of the return
>> value, then perhaps we can start to work bottom-up wise!
>>   
> 
> After all it depends on whether the Working Group wants to work
> bottom-up wise or not. After the f2f in December I had the impression
> that there is agreement on this, but reading "top-down" discussions like
> "do we expect people using our API to be able to deal with SKOS ?", I am
> afraid of strategic decisions like "we use SKOS for our ontology", which
> will lead to implementation problems in the API later and a long delay
> in the schedule of the Working Group. Eventually the working group needs
> to decide whether it want mainly compete with or contribute to
> approaches like
> http://www.w3.org/2008/10/24-mediaann-minutes.html#item01
> or not. Though my reading of the charter is that we have to compete with
> these, or otherwise we fail ...
> 
> Felix
> 
>>
>> /Joakim
>>
>>
>> -----Original Message-----
>> From: Felix Sasaki [mailto:fsasaki@w3.org] Sent: den 2 februari 2009
>> 00:00
>> To: Joakim Söderberg
>> Cc: Pierre-Antoine Champin; public-media-annotation@w3.org
>> Subject: Re: action 78 - Discussion about interoperability
>>
>> Joakim Söderberg さんは書きました:
>>  
>>> Felix,
>>> I also reacted on the statement that "domain" and "range" would imply
>>> RDF, which is not true.       
>>
>> Fair enough.
>>
>>  
>>> Further I believe that people are keen on thinking about the problems
>>> in terms of tools they master.
>>>     
>> I think my main point is: currently nobody masters mapping of media
>> annotations in a working implementation. So we no existing approaches
>> and tools, but two approaches to move forward:
>> 1) Think of a mapping architecture top-down
>> 2) Think of properties we want to map bottom-up, and see what
>> architecture they require, and how they can be implemented in the
>> tools we are used to
>> I am very much in favor of 2) and would propose to make the mapping
>> table much smaller by asking: who is planning to implement a mapping
>> for an existing format? If nobody volunteers for some properties, we
>> drop that part of the format - or even the complete format.
>> If we continue to discuss in the style of 1), I am very worried that
>> we end up with an elaborate, but very hard to implement architecture.
>>
>> This is my personal opinion, but I think the co-chairs need to get
>> ASAP consensus on the general approach, so that people go in the same
>> direction.
>>
>> Felix
>>
>>
>>  
>>>  I myself have troubles to see a solution with out them. But maybe
>>> you can give an example?
>>>
>>> Best regards
>>> Joakim
>>>
>>> -----Original Message-----
>>> From: public-media-annotation-request@w3.org
>>> [mailto:public-media-annotation-request@w3.org] On Behalf Of
>>> Pierre-Antoine Champin
>>> Sent: den 28 januari 2009 12:51
>>> To: Felix Sasaki
>>> Cc: public-media-annotation@w3.org
>>> Subject: Re: action 78 - Discussion about interoperability
>>>
>>>
>>> Felix,
>>>
>>> I understand perfectly your concerns about the need to implement soon,
>>> in order to validate the spec, rather than implement to late and
>>> invalidate the parts of the spec that turn out to be too hard to
>>> implement.
>>>
>>> However, I find it quite difficult to start to implement before we have
>>> agreed a little more on some points. The mapping table captures an
>>> agreement on "high-level" semantics. We have started to discuss the
>>> syntax problems, especially in relation with req-r13, but have not
>>> reached a consensus yet. What I think is still very unclear are the
>>> "low-level" semantics features.
>>>
>>> And by the way, "domain" and "range" are not specific to RDF! It is
>>> true, though, that they imply a formalization. However, I guess Tobias
>>> advocates the point that having a formal ontology would make it easier
>>> to implement, not more difficult. Of course, that would require from
>>> implementers that they understand the formal ontology, which could
>>> hinder acceptance. But on the other hand, I believe that it would reduce
>>> the risks of having heterogeneous (hence not interoperable)
>>> implementations.
>>>
>>>   Pierre-Antoine
>>>
>>> Felix Sasaki wrote:
>>>      
>>>> Tobias Bürger wrote:
>>>>          
>>>>> Dear all,
>>>>>
>>>>> as promised in the telecon, here my reply to Felix' mail. See comment
>>>>> below
>>>>>
>>>>> Felix Sasaki wrote:
>>>>>              
>>>>>> Pierre-Antoine Champin さんは書きました:
>>>>>>                  
>>>>>>> Hi,
>>>>>>>
>>>>>>> for action 78, I had to write a wiki page about some concerns I
>>>>>>> raised
>>>>>>> during the last telecon about interoperability between mapped
>>>>>>> properties. Since this is supposed to be matter for discussion
>>>>>>> rather
>>>>>>> than a formal document, I think it is best to send it as a mail.
>>>>>>>
>>>>>>>
>>>>>>> What triggered my concern was the mapping for Media RSS, between
>>>>>>> ''dc:creator'' and ''dcterms:creator''. Just as a reminder, the
>>>>>>> Dublin
>>>>>>> Core vocabulary has two versions: the legacy "elements" (usually
>>>>>>> prefixed with ''dc'') and the "terms" (usually prefixed with
>>>>>>> ''dcterms''). Each term is more specific than its corresponding
>>>>>>> element,
>>>>>>> as its values are more constrained. For example, ''dc:creator'' can
>>>>>>> have
>>>>>>> any type of value (including a plain string), while
>>>>>>> 'dcterms:creator''
>>>>>>> must have a URI, which must denote an instance of ''dcterms:Agent''.
>>>>>>> If we decide to specify the ontology only as prose
>>>>>>> Let us consider the example of ''dc:creator'' with a sample of
>>>>>>> mappings:
>>>>>>>
>>>>>>> * for XMP, its value is a sequence of strings, each string being the
>>>>>>> name of an author.
>>>>>>>
>>>>>>> * for Media RDF, its value is either
>>>>>>>   - a plain string,
>>>>>>>   - an instance of ''foaf:Agent'' with at least a ''foaf:name'', or
>>>>>>>   - an instance of ''vcard'' with at least a ''fn''.
>>>>>>> Since they are using ''dcterms'', it must also be inferred to be a
>>>>>>> ''dcterms:Agent'' (which contradicts the use of a plain
>>>>>>> string...). It
>>>>>>> may represent only one ("the primary") creator.
>>>>>>>
>>>>>>> * for ID3, the value of TOPE is a string, where names are separated
>>>>>>> by "/".
>>>>>>>
>>>>>>>
>>>>>>> My point here is that, beyond the "high level" semantic links
>>>>>>> identified
>>>>>>> by the mapping table, there are some "low level" discrepancies
>>>>>>> that are
>>>>>>> both semantic (e.g. representing one or several creators) and
>>>>>>> syntactic
>>>>>>> (slash-separated string or structured sequence).
>>>>>>>
>>>>>>> Leaving these issues to the implementation will inevitably lead to
>>>>>>> major
>>>>>>> differences and a lack of interoperability. We could specify down to
>>>>>>> the
>>>>>>> syntactical level the mapping for each property in each format, but
>>>>>>> what
>>>>>>> about other formats ?
>>>>>>>
>>>>>>> I think a better way to limit the variability in implementations by
>>>>>>> specifying precisely, for each property of our ontology, the
>>>>>>> expected
>>>>>>> "low level" features of its value (and not only its "high level"
>>>>>>> meaning) so that implementors know what they can keep from the
>>>>>>> original
>>>>>>> metadata, and what they need to adapt (i.e. split ID3's TOPE
>>>>>>> field into
>>>>>>> multiple values).
>>>>>>>
>>>>>>> This has to be done at least at the API level. But I guess this
>>>>>>> could
>>>>>>> also be done to some extent at the ontology level (I do believe that
>>>>>>> those "low level" features are *not only* syntactic), but that
>>>>>>> raises
>>>>>>> again the problem of formally specifying the ontology or not.
>>>>>>>
>>>>>>> But the less specific we are in describing the ontology, the more
>>>>>>> precise we will have to be in describing the API, in order to avoid
>>>>>>> "low
>>>>>>> level" semantic discrepancies.
>>>>>>>                         
>>>>>> I agree very much with your analysis, Pierre-Antoine. +1 to have a
>>>>>> very low wheight ontology and to be more precise in the API
>>>>>> description. Also I am hoping very much that people will volunteer to
>>>>>> actually test the mappings in toy implementations, no matter if
>>>>>> relying on a complex ontology or a detailed API. No matter which way
>>>>>> we go, let's test them now.
>>>>>>                   
>>>>> I guess the intention of the toy implementations should be to get a
>>>>> more deeper understanding which type of mismatches between the
>>>>> properties defined in the different formats might occur.               
>>>> No, not at all! The toy implementation or real implementation (whatever
>>>> you can create) is necessary for us to move forward in the W3C process.
>>>> My opinion is that we should work (toy or real) implementation driven:
>>>> use only the properties which are actually implemented in the API. The
>>>> others are dropped. E.g. everything is dropped from the mapping table
>>>> which is not implemented, *before* we go to last call.
>>>>
>>>>
>>>>          
>>>>> This might include data type mismatches, but also structural
>>>>> mismatches as outlined by Pierre-Antoine above. If you derive them by
>>>>> hard thinking or by prototypical implementations does not matter, as
>>>>> long as we are aware of them, because we finally have to implement
>>>>> them at some point in time.
>>>>>               
>>>> The crucial part is "some point in time". I have seen several working
>>>> groups which developed very elaborates specs - but when it came to
>>>> implementing them they ran into difficulties and had to revise the
>>>> specs. In terms of W3C that means: going back to a normal working draft
>>>> and loose probably 1/2 year. That is what I want to avoid by asking you
>>>> to work on implementations now.
>>>>
>>>>          
>>>>> Regarding the mapping, and more specifically from where we should map:
>>>>> The mapping should be to our core ontology to whose semantics we
>>>>> committed ourselves or will commit. So we will define what we allow as
>>>>> the domain and range of a property.
>>>>>               
>>>> The mapping does not need to be defined in terms of range and
>>>> properties. Please don't use RDF specific terminology - we have no
>>>> agreement to restrict ourself to this.
>>>>
>>>>
>>>>          
>>>>> And I disagree to the last statement from Pierre-Antoine above: if we
>>>>> describe the ontology less specific than we also do not need to be
>>>>> more precise in the API. It has been my understanding that this group
>>>>> defines an ontology consisting of a set of core properties for the
>>>>> description of media objects on the Web to which all the formats in
>>>>> our scope will be mapped to. Saying that, if you describe the ontology
>>>>> more lightweight, meaning perhaps with less detail or level of
>>>>> specifity, than you also map to something not very specific.
>>>>>               
>>>> I think we could avoid these kinds of discussions by just starting
>>>> implementing the mappings.
>>>>
>>>>
>>>>          
>>>>> For me the API is a means to transparently access a description of a
>>>>> media object in a format about which I do not want to care about when
>>>>> accessing the API. So we should define return types? Or should the
>>>>> burden of identying the return type be shifted to the user? (I guess
>>>>> we had this discussion before but did not come to a conclusion....)
>>>>>               
>>>> We have a requirement for this:
>>>> http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/#req-r13
>>>>
>>>> I think the current problem of the group is that there is an unbalance
>>>> between the goal of defining a read-only API, and the participants who
>>>> are mostly interested in an ontology, and also mostly in an RDF-based
>>>> ontology. One solution to this unbalance is to get other people on
>>>> board
>>>> who are more interested in the API. I hope that this will happen soon.
>>>>
>>>> Felix
>>>>           
>>>       
>>
>>
>>   
> 
> 
>
Received on Tuesday, 3 February 2009 11:39:03 UTC