action 78 - Discussion about interoperability from Pierre-Antoine Champin on 2009-01-23 (public-media-annotation@w3.org from January 2009)

From: Pierre-Antoine Champin <pchampin@liris.cnrs.fr>
Date: Fri, 23 Jan 2009 17:19:28 +0000
To: public-media-annotation@w3.org
Message-ID: <4979FC20.40406@liris.cnrs.fr>

Hi,

for action 78, I had to write a wiki page about some concerns I raised
during the last telecon about interoperability between mapped
properties. Since this is supposed to be matter for discussion rather
than a formal document, I think it is best to send it as a mail.


What triggered my concern was the mapping for Media RSS, between
''dc:creator'' and ''dcterms:creator''. Just as a reminder, the Dublin
Core vocabulary has two versions: the legacy "elements" (usually
prefixed with ''dc'') and the "terms" (usually prefixed with
''dcterms''). Each term is more specific than its corresponding element,
as its values are more constrained. For example, ''dc:creator'' can have
any type of value (including a plain string), while 'dcterms:creator''
must have a URI, which must denote an instance of ''dcterms:Agent''.
If we decide to specify the ontology only as prose
Let us consider the example of ''dc:creator'' with a sample of mappings:

* for XMP, its value is a sequence of strings, each string being the
name of an author.

* for Media RDF, its value is either
  - a plain string,
  - an instance of ''foaf:Agent'' with at least a ''foaf:name'', or
  - an instance of ''vcard'' with at least a ''fn''.
Since they are using ''dcterms'', it must also be inferred to be a
''dcterms:Agent'' (which contradicts the use of a plain string...). It
may represent only one ("the primary") creator.

* for ID3, the value of TOPE is a string, where names are separated by "/".


My point here is that, beyond the "high level" semantic links identified
by the mapping table, there are some "low level" discrepancies that are
both semantic (e.g. representing one or several creators) and syntactic
(slash-separated string or structured sequence).

Leaving these issues to the implementation will inevitably lead to major
differences and a lack of interoperability. We could specify down to the
syntactical level the mapping for each property in each format, but what
about other formats ?

I think a better way to limit the variability in implementations by
specifying precisely, for each property of our ontology, the expected
"low level" features of its value (and not only its "high level"
meaning) so that implementors know what they can keep from the original
metadata, and what they need to adapt (i.e. split ID3's TOPE field into
multiple values).

This has to be done at least at the API level. But I guess this could
also be done to some extent at the ontology level (I do believe that
those "low level" features are *not only* syntactic), but that raises
again the problem of formally specifying the ontology or not.

But the less specific we are in describing the ontology, the more
precise we will have to be in describing the API, in order to avoid "low
level" semantic discrepancies.

 regards

  Pierre-Antoine

Received on Friday, 23 January 2009 17:20:15 UTC