Re: New Draft comments: textual bodies from Bernhard Haslhofer on 2013-01-09 (public-openannotation@w3.org from January 2013)

From: Bernhard Haslhofer <bernhard.haslhofer@cornell.edu>
Date: Wed, 9 Jan 2013 10:21:18 -0500
To: Antoine Isaac <aisaac@few.vu.nl>
Cc: public-openannotation <public-openannotation@w3.org>
Message-ID: <D8F2CCC77D1244D2B5270C4CAFA54154@gmail.com>
Hi,  

I not yet finished with reviewing the entire draft, but I guess I can already comment on this. My short answer is: I fully agree with Antoine's perspective and prefer oa:hasBody to be typed as rdf:Property, which gives the flexibility to attach both literals and resources. I also expressed this concern before as part of the lessons we learned in our Maphub experiment.

I think if a body looks like a string, can be encoded as a string and should be interpreted as a string, then it is probably a string and we should not force people to represent it differently. Also, BNodes and/or UUID nodes are non-trivial and to some extend controversial concepts and we should not force developers who want to represent really simple annotations in OA to get into this, if not required.

One argument that prevented us from allowing string bodies in the past was that most annotations bodies won't be strings. However, in many OA prototypes I have seen so far, bodies ARE simple strings; also in our own cookbook examples almost half of the examples have bodies, which are also simple strings. So I think we should take this into account and reconsider this design decision.

The other argument is OWL-DL compatibility; and yes, if we want to maintain this then we can either introduce an additional short-cut property, continue with the current solution, or take Jacco's suggestion. However, before we do this, I would ask how many use cases require OWL-DL compatibility. In Maphub we don't need it and again, it seems that the majority of annotation use cases does not rely on it. I certainly agree that some use cases will need it, but question if this should be the driving design motivation. I think at the end it comes down to the question, if OA should facilitate the construction of a formal annotation knowledge base or if OA should facilitate sharing of annotations (data) and linkage of Web resources. My personal preference is clearly the latter because I think it is more down-to-earth.

Maybe it is also worth mentioning that many DBPedia properties are typed as rdf:Property. Here is an example: http://dbpedia.org/page/The_Shining_(film) … the producer of a movie can be a resource (if more info is available) or simply a string (if only the name is known). I mentioning this, because people are using DBPedia data and it seems that people are generally happy with this design decision.

If people think that OWL-DL compatibility is a fundamental design requirement for the OA model, then I like Jacco's suggestion. However, I also want to point that the people might run into practical problems when we allow bnodes. It is, for instance, hard to compute hashes over annotation representation containing nodes that don't have names. Existing RDF libraries, often assign internal ids to bnodes, but they change when an annotation moves from system A to system B.

As a concrete step I propose:
- to allow simple literals for oa:hasBody and
- to change the type of oa:hasBody to rdf:Property and update the corresponding examples in the spec
- provide guidelines in the appendix explaining how people who require OWL-DL compatibility could transform OA data into a logically consistent knowledge base.

Best,
Bernhard

------
Bernhard Haslhofer
Lecturer, Postdoctoral Associate
Cornell University, Information Science
301 College Avenue

bernhard.haslhofer@cornell.edu (mailto:bernhard.haslhofer@cornell.edu)
http://www.cs.cornell.edu/~bh392


On Sunday, January 6, 2013 at 10:51 AM, Antoine Isaac wrote:

> Dear all,
>  
> I've refrained from putting in my earlier comments some more discussion-level issues focusing on http://www.openannotation.org/spec/future/level1.html#BodyEmbed
>  
> "This model was chosen over having a literal as the Body directly for the following reasons:"
> I'm sorry, but I still don't buy most of the reasons. And I believe I won't be the only one...
> Going through individual points:
>  
>  
> - "Literals have no identity, and thus cannot be referenced. If the Body was a literal, it would be impossible to refer to it directly and this is considered to be important in many use cases."
> To me it's a positive point *no* to give identifiers to simple sting literals. What does it mean, when you give an identifier to a string like "interesting!" or "I should read this"? And if the same string is assigned different literals? To me when you have to refer to a string from different places (statements), it means that you have already more than a string - it becomes a kind of document.
>  
>  
> - "It would be inconsistent with the rest of the model which allows any resource as a Body or Target, and thus would be a special case just for text in the Body."
> This one is better. But it is mitigated by the fact that in RDF literals are in fact resources, too (http://www.w3.org/TR/rdf-schema/#ch_literal). There are reasons (related to reasoning or syntax for which properties with literals as objects are distinguished from properties with "fully-fledged" resources as objects. But they do not apply to all RDF-based models.
>  
>  
> - "While literals can have their language and datatype associated with them in RDF, there are other aspects of text that are important for interpretation that cannot be associated with a literal. Examples include directionality of the text and encoding, plus of course metadata such as authorship, date of creation and so forth, which would not be possible."
>  
> This is very true - though it would help reader if you gave more info on what "directionality" means here.
> But this argument is not against allowing literals as bodies. It just says that in some case, the bodies are sophisticated, document-like resources. Fair enough. But I will argue (and many others will) that many scenarios don't need this. And that it's not reasonable to impose on these latter scenarios the representation details that the former cases need. Caricaturing a bit, it looks as if we prevented string value attributes in object-oriented programming, on the basis that some texts deserve to be treated as objects.
>  
> Note that we faced a similar situation in SKOS, for documentation properties (http://www.w3.org/TR/skos-primer/#secadvanceddocumentation). And the decision we made then is that these properties can be used either with simple literals or more complex resources. See
>  
>  
> - "If a server wished to extract the text and make it a resource with an HTTP URI, it would not be possible to assert equivalence or provenance."
>  
> I think it is the utter prerogative of annotation-producing applications, to decide whether the bodies they produce are worthy of specific provenance data or not. Is there a point in keeping track of whether someone "created" a string like 'I should read this' in the first place?
> On the equivalence the argument is also not convincing: in fact literals come with equivalence conditions that are easy to get and already implemented. Trying to come with equivalence relationships between "resourcified literals" is much harder, both for spec designers or application builders (if we let them handle the issue). While working with the SKOS-XL extension have tried to open the can of worms of equivalence/identity conditions. We quite quickly postponed the issue (http://www.w3.org/TR/skos-reference/#L5739).
>  
>  
> - "The cost of using ContentAsText is minimal: just one additional required triple over the literal case."
>  
> I quite agree with the principle, though this one additional triples means millions of additional ones -- I expect that cases of simple text annotation will be very very common.
> But I don't buy it in a context in which it is recommended to type the resource with a dcmitypes class, to type it as cnt:ContentAsClass and to give its MIME type using dc:format. That's 4 triples, not 1. And many of them can be seen as of dubious added value (see earlier comments)
>  
>  
> Note that the two SKOS patterns mentioned above (documentation properties and SKOS-XL) could be used in OA to have simple text bodies co-exist with more complex ones, either in relative isolation (SKOS documentation pattern) and with a tighter correspondence (SKOS-XL pattern allows to switch from one pattern to the other).
> And I believe that relative isolation is not as bad as it looks. Applications who produce simple bodies can only be bothered by the perspective of having more complex data on these bodies. And applications requiring more complex data (say, provenance) would probably need some more complex procedure to generate it from the data produced by the simpler applications.
>  
>  
> Best,
>  
> Antoine
Received on Wednesday, 9 January 2013 15:21:47 UTC