Re: New Draft comments: textual bodies from Stian Soiland-Reyes on 2013-01-16 (public-openannotation@w3.org from January 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Wed, 16 Jan 2013 12:24:19 +0000
To: Bernhard Haslhofer <bernhard.haslhofer@cornell.edu>
Cc: public-openannotation <public-openannotation@w3.org>
Message-ID: <CAPRnXtkEkhrUnXfMhaTFvjwU8cE0VTgiKM1P3Mu--uC7mZYqPg@mail.gmail.com>
On Thu, Jan 10, 2013 at 3:08 PM, Bernhard Haslhofer
<bernhard.haslhofer@cornell.edu> wrote:
>> I like this conclusion. I am very uneasy with a property being both an
>> Object Property and Data Property, that makes it very tricky in an OWL
>> context and how to parse in implementations.
> Just out of curiosity: could you elaborate a bit on this? Why is tricky in an
> OWL context and why is it tricky to parse in implementations?

Well, as Rinke showed, it implies (at least now illegal but often
supported) punning of both an Object property and Data property, and
the only way to do that now would be to use an Annotation property
instead, which does not specify any range but is semantically
misleading in this case as the body is quite an important part of an
oa:Annotation, almost required.


Using tools that parse an ontology/schema and generate classes, such
as Sesame's Alibaba and Elmo, would I presume simply not work well, as
they represent annotated Java interfaces for classes of resources, and
use String/int etc. for literals.

Even using low-level regular RDF tools like rdflib and Jena could get
tricky, as you would have to first try to get the body as a
literal/catch exception; or dispatch on isLiteral() equivalent, and
then try to get it as a resource if not. If there is a mix of
resources and bodies, it would get tricky both how to retrieve them,
and also how to represent them internally. Basically implementers
would in many cases have to create the equivalent of an ContentInRDF
node in the memory and replace it at last minute.  In dynamic
languages like Python doing SPARQL, you could have retrieved the body
"resource" 5 steps ago, passed it along, and then try to execute
methods on it, and then just realize you got a string instead.  (This
would even more hit developers who get used to retrieving the strings
initially, and then later all the sudden get these "advanced" bodies
that have URIs).


If we use ContentInRDF solution (or even just a bnode with rdf:value),
then there is no need to dispatch on anything unless you are
particularly concerned about retrieving the body content, in which
case you would still need to handle the exceptional case of a resource
identifier you can't resolve.

As someone from the provenance community I also prefer having a node I
can attach additional metadata to. I can't attach anything to a
literal. To a ContentInRDF node I can still attach timestamps,
authorship information, format information, checksums, mirrors, links
to other formats, etc.


> As far as I know, dcterms:creator is of type rdf:Property but the guidelines
> suggest to use it only with resources.

The range is dcterms:Agent - it would be tricky to make a literal be a
dcterms:Agent although I guess some still try..


> Maybe…but sometimes it is worth to re-consider model design decisions after
> observing some real-world patterns, like the use of plain strings.

That I agree. I don't want to overcomplicate things. That said, if we
only promote one particular use of ContentInRdf - or even just a
custom oa:Content with an rdf:value - then I don't think it should be
particularly complicated.


:tweet1 a oa:Annotation ;
  oa:hasTarget <http://www.example.com/> ;
  oa:hasBody [
    cnt:chars "@soilandreyes http://www.example.com/ What a terrible
example #oac" ;
  ] ;
  pav:importedFrom <https://twitter.com/soilandreyes/> .


And thus I could instead be a bit more verbose about how and when I
got hold of that body:

  oa:hasBody [
    cnt:chars "@soilandreyes http://www.example.com/ What a terrible
example #oac" ;
    pav:importedFrom
<https://twitter.com/soilandreyes/status/291516466212241408> ;
    pav:importedOn "2012-01-16T12:08:13Z" ;
  ] ;

(Here I really wanted to use pav:retrievedFrom, but sadly Twitter has
no easy API to directly get only the tweet text ;) )


.. and if I used one of those URNs instead of a bnode, even other
people could do the above and more. It is a gradual road to open and
linked data.




> A major practical problem I experienced in the past was that it is really hard to compute hashes
> over RDF serializations containing bnodes. I mean, you can compute them, but they are pretty much
> useless because they are not comparable.

Yes, specially once you have more than one bnode referring each other.
Even two bnodes stating exactly the same things could be different
resources.



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Wednesday, 16 January 2013 12:25:09 UTC