Re: action 78 - Discussion about interoperability from Felix Sasaki on 2009-02-02 (public-media-annotation@w3.org from February 2009)

From: Felix Sasaki <fsasaki@w3.org>
Date: Mon, 02 Feb 2009 21:11:54 +0900
To: Joakim Söderberg <joakim.soderberg@ericsson.com>
CC: Pierre-Antoine Champin <pchampin@liris.cnrs.fr>, public-media-annotation@w3.org
Message-ID: <4986E30A.5080608@w3.org>
Joakim Söderberg さんは書きました:
> Hello,
> From my horizon I can see that we agree on the following:
>
> 1) We have different types of mismatches; between the properties, data types and structure that can be solved by different parts of our standard.
> We use the following terms to refer to them:
>
> * High-level semantics  = (ontology) Semantic links identified by the mapping table.    

If we realize the requirement
http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/#req-r05
there would be no semantics links. The ontology would be just a list of 
terms. So there would be no semantic links, but not more as a prose 
description of mappings like in
http://dev.w3.org/2008/video/mediaann/mediaont-api-1.0/mediaont-api-1.0.html#property-createDate

>    
>
> * Low-level semantics = (content of the return types) Structure, e.g. representing one or several creators. 
>
> * Syntax    = (API) return types, e.g. slash-separated string or        structured sequence
>   

for me, there is not necessarily a difference to separate between 
"low-level semantics" and "syntax". You can subsume both under data 
types, see e.g.
http://www.w3.org/TR/REC-xml/#sec-attribute-types
where you have e.g. IDREFS vs. IDREF to differentiate one IDs versus 
several ones.
Of course it is possible to differentiate between the two, but somebody 
who uses our API needs to know "what do I get back from a method" - and 
"what" here subsumes both.
>
> 2) The goal of this group is to define an ontology consisting of a set of core properties for the description of media objects on the Web to which all the formats in our scope will be mapped to.
>   

Sorry, again disagree at least slightly. My version would be:

2) The goal of this group is to define an ontology consisting of a set of core properties for the description of media objects on the Web to which all the formats which are implemented in the API will be mapped to.


As I said before:
"who is planning to implement a mapping for an existing format? If 
nobody volunteers for some properties, we drop that part of the format - 
or even the complete format."
Or in other words: what becomes part of the ontology depends on 
volunteers for implementing the relevant parts of the API.

> The API is a means to transparently access a description of a media object in a format which the user do not need to be knowledgeable of when accessing the API. 
>
>
> 3) Some proposals and important questions are:
> - Should we define return types? 
>   

I think this is mandatory. If our api returns for things like 
getCreateDate values like
2002-01-01
Jan. 1st 2002
We put a high burden on implementers to clean this up. IMO the cleaning 
up is our job.

> - Can we use "domain" and "range" to structure the mismatches?
>   

IMO we can use whatever terminology we want in the ontology, as long as 
people see the API not just as a "spin off" of the more important 
ontology work. Please remember also that our test suite is also likely 
to be consistening of media files , to be used as an input for our read 
only metadata access API. How do you want to test the interoperability 
between API implementations if the do not need to agree on 2002-01-01 
vs. "Jan. 1st 2002"?


>
> If we at least agree on how to specify the "content" of the return value, then perhaps we can start to work bottom-up wise!
>   

After all it depends on whether the Working Group wants to work 
bottom-up wise or not. After the f2f in December I had the impression 
that there is agreement on this, but reading "top-down" discussions like 
"do we expect people using our API to be able to deal with SKOS ?", I am 
afraid of strategic decisions like "we use SKOS for our ontology", which 
will lead to implementation problems in the API later and a long delay 
in the schedule of the Working Group. Eventually the working group needs 
to decide whether it want mainly compete with or contribute to 
approaches like
http://www.w3.org/2008/10/24-mediaann-minutes.html#item01
or not. Though my reading of the charter is that we have to compete with 
these, or otherwise we fail ...

Felix

>
> /Joakim
>
>
> -----Original Message-----
> From: Felix Sasaki [mailto:fsasaki@w3.org] 
> Sent: den 2 februari 2009 00:00
> To: Joakim Söderberg
> Cc: Pierre-Antoine Champin; public-media-annotation@w3.org
> Subject: Re: action 78 - Discussion about interoperability
>
> Joakim Söderberg さんは書きました:
>   
>> Felix,
>> I also reacted on the statement that "domain" and "range" would imply RDF, which is not true. 
>>   
>>     
>
> Fair enough.
>
>   
>> Further I believe that people are keen on thinking about the problems in terms of tools they master.
>>     
> I think my main point is: currently nobody masters mapping of media 
> annotations in a working implementation. So we no existing approaches 
> and tools, but two approaches to move forward:
> 1) Think of a mapping architecture top-down
> 2) Think of properties we want to map bottom-up, and see what 
> architecture they require, and how they can be implemented in the tools 
> we are used to
> I am very much in favor of 2) and would propose to make the mapping 
> table much smaller by asking: who is planning to implement a mapping for 
> an existing format? If nobody volunteers for some properties, we drop 
> that part of the format - or even the complete format.
> If we continue to discuss in the style of 1), I am very worried that we 
> end up with an elaborate, but very hard to implement architecture.
>
> This is my personal opinion, but I think the co-chairs need to get ASAP 
> consensus on the general approach, so that people go in the same direction.
>
> Felix
>
>
>   
>>  I myself have troubles to see a solution with out them. But maybe you can give an example?
>>
>> Best regards
>> Joakim
>>
>> -----Original Message-----
>> From: public-media-annotation-request@w3.org [mailto:public-media-annotation-request@w3.org] On Behalf Of Pierre-Antoine Champin
>> Sent: den 28 januari 2009 12:51
>> To: Felix Sasaki
>> Cc: public-media-annotation@w3.org
>> Subject: Re: action 78 - Discussion about interoperability
>>
>>
>> Felix,
>>
>> I understand perfectly your concerns about the need to implement soon,
>> in order to validate the spec, rather than implement to late and
>> invalidate the parts of the spec that turn out to be too hard to implement.
>>
>> However, I find it quite difficult to start to implement before we have
>> agreed a little more on some points. The mapping table captures an
>> agreement on "high-level" semantics. We have started to discuss the
>> syntax problems, especially in relation with req-r13, but have not
>> reached a consensus yet. What I think is still very unclear are the
>> "low-level" semantics features.
>>
>> And by the way, "domain" and "range" are not specific to RDF! It is
>> true, though, that they imply a formalization. However, I guess Tobias
>> advocates the point that having a formal ontology would make it easier
>> to implement, not more difficult. Of course, that would require from
>> implementers that they understand the formal ontology, which could
>> hinder acceptance. But on the other hand, I believe that it would reduce
>> the risks of having heterogeneous (hence not interoperable) implementations.
>>
>>   Pierre-Antoine
>>
>> Felix Sasaki wrote:
>>   
>>     
>>> Tobias Bürger wrote:
>>>     
>>>       
>>>> Dear all,
>>>>
>>>> as promised in the telecon, here my reply to Felix' mail. See comment
>>>> below
>>>>
>>>> Felix Sasaki wrote:
>>>>       
>>>>         
>>>>> Pierre-Antoine Champin さんは書きました:
>>>>>         
>>>>>           
>>>>>> Hi,
>>>>>>
>>>>>> for action 78, I had to write a wiki page about some concerns I raised
>>>>>> during the last telecon about interoperability between mapped
>>>>>> properties. Since this is supposed to be matter for discussion rather
>>>>>> than a formal document, I think it is best to send it as a mail.
>>>>>>
>>>>>>
>>>>>> What triggered my concern was the mapping for Media RSS, between
>>>>>> ''dc:creator'' and ''dcterms:creator''. Just as a reminder, the Dublin
>>>>>> Core vocabulary has two versions: the legacy "elements" (usually
>>>>>> prefixed with ''dc'') and the "terms" (usually prefixed with
>>>>>> ''dcterms''). Each term is more specific than its corresponding
>>>>>> element,
>>>>>> as its values are more constrained. For example, ''dc:creator'' can
>>>>>> have
>>>>>> any type of value (including a plain string), while 'dcterms:creator''
>>>>>> must have a URI, which must denote an instance of ''dcterms:Agent''.
>>>>>> If we decide to specify the ontology only as prose
>>>>>> Let us consider the example of ''dc:creator'' with a sample of
>>>>>> mappings:
>>>>>>
>>>>>> * for XMP, its value is a sequence of strings, each string being the
>>>>>> name of an author.
>>>>>>
>>>>>> * for Media RDF, its value is either
>>>>>>   - a plain string,
>>>>>>   - an instance of ''foaf:Agent'' with at least a ''foaf:name'', or
>>>>>>   - an instance of ''vcard'' with at least a ''fn''.
>>>>>> Since they are using ''dcterms'', it must also be inferred to be a
>>>>>> ''dcterms:Agent'' (which contradicts the use of a plain string...). It
>>>>>> may represent only one ("the primary") creator.
>>>>>>
>>>>>> * for ID3, the value of TOPE is a string, where names are separated
>>>>>> by "/".
>>>>>>
>>>>>>
>>>>>> My point here is that, beyond the "high level" semantic links
>>>>>> identified
>>>>>> by the mapping table, there are some "low level" discrepancies that are
>>>>>> both semantic (e.g. representing one or several creators) and syntactic
>>>>>> (slash-separated string or structured sequence).
>>>>>>
>>>>>> Leaving these issues to the implementation will inevitably lead to
>>>>>> major
>>>>>> differences and a lack of interoperability. We could specify down to
>>>>>> the
>>>>>> syntactical level the mapping for each property in each format, but
>>>>>> what
>>>>>> about other formats ?
>>>>>>
>>>>>> I think a better way to limit the variability in implementations by
>>>>>> specifying precisely, for each property of our ontology, the expected
>>>>>> "low level" features of its value (and not only its "high level"
>>>>>> meaning) so that implementors know what they can keep from the original
>>>>>> metadata, and what they need to adapt (i.e. split ID3's TOPE field into
>>>>>> multiple values).
>>>>>>
>>>>>> This has to be done at least at the API level. But I guess this could
>>>>>> also be done to some extent at the ontology level (I do believe that
>>>>>> those "low level" features are *not only* syntactic), but that raises
>>>>>> again the problem of formally specifying the ontology or not.
>>>>>>
>>>>>> But the less specific we are in describing the ontology, the more
>>>>>> precise we will have to be in describing the API, in order to avoid
>>>>>> "low
>>>>>> level" semantic discrepancies.
>>>>>>   
>>>>>>           
>>>>>>             
>>>>> I agree very much with your analysis, Pierre-Antoine. +1 to have a
>>>>> very low wheight ontology and to be more precise in the API
>>>>> description. Also I am hoping very much that people will volunteer to
>>>>> actually test the mappings in toy implementations, no matter if
>>>>> relying on a complex ontology or a detailed API. No matter which way
>>>>> we go, let's test them now.
>>>>>         
>>>>>           
>>>> I guess the intention of the toy implementations should be to get a
>>>> more deeper understanding which type of mismatches between the
>>>> properties defined in the different formats might occur. 
>>>>       
>>>>         
>>> No, not at all! The toy implementation or real implementation (whatever
>>> you can create) is necessary for us to move forward in the W3C process.
>>> My opinion is that we should work (toy or real) implementation driven:
>>> use only the properties which are actually implemented in the API. The
>>> others are dropped. E.g. everything is dropped from the mapping table
>>> which is not implemented, *before* we go to last call.
>>>
>>>
>>>     
>>>       
>>>> This might include data type mismatches, but also structural
>>>> mismatches as outlined by Pierre-Antoine above. If you derive them by
>>>> hard thinking or by prototypical implementations does not matter, as
>>>> long as we are aware of them, because we finally have to implement
>>>> them at some point in time.
>>>>       
>>>>         
>>> The crucial part is "some point in time". I have seen several working
>>> groups which developed very elaborates specs - but when it came to
>>> implementing them they ran into difficulties and had to revise the
>>> specs. In terms of W3C that means: going back to a normal working draft
>>> and loose probably 1/2 year. That is what I want to avoid by asking you
>>> to work on implementations now.
>>>
>>>     
>>>       
>>>> Regarding the mapping, and more specifically from where we should map:
>>>> The mapping should be to our core ontology to whose semantics we
>>>> committed ourselves or will commit. So we will define what we allow as
>>>> the domain and range of a property.
>>>>       
>>>>         
>>> The mapping does not need to be defined in terms of range and
>>> properties. Please don't use RDF specific terminology - we have no
>>> agreement to restrict ourself to this.
>>>
>>>
>>>     
>>>       
>>>> And I disagree to the last statement from Pierre-Antoine above: if we
>>>> describe the ontology less specific than we also do not need to be
>>>> more precise in the API. It has been my understanding that this group
>>>> defines an ontology consisting of a set of core properties for the
>>>> description of media objects on the Web to which all the formats in
>>>> our scope will be mapped to. Saying that, if you describe the ontology
>>>> more lightweight, meaning perhaps with less detail or level of
>>>> specifity, than you also map to something not very specific.
>>>>       
>>>>         
>>> I think we could avoid these kinds of discussions by just starting
>>> implementing the mappings.
>>>
>>>
>>>     
>>>       
>>>> For me the API is a means to transparently access a description of a
>>>> media object in a format about which I do not want to care about when
>>>> accessing the API. So we should define return types? Or should the
>>>> burden of identying the return type be shifted to the user? (I guess
>>>> we had this discussion before but did not come to a conclusion....)
>>>>       
>>>>         
>>> We have a requirement for this:
>>> http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/#req-r13
>>>
>>> I think the current problem of the group is that there is an unbalance
>>> between the goal of defining a read-only API, and the participants who
>>> are mostly interested in an ontology, and also mostly in an RDF-based
>>> ontology. One solution to this unbalance is to get other people on board
>>> who are more interested in the API. I hope that this will happen soon.
>>>
>>> Felix
>>>     
>>>       
>>   
>>     
>
>
>
Received on Monday, 2 February 2009 12:12:34 UTC