New Draft comments: textual bodies from Antoine Isaac on 2013-01-06 (public-openannotation@w3.org from January 2013)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Sun, 6 Jan 2013 16:51:16 +0100
To: public-openannotation <public-openannotation@w3.org>
Message-ID: <50E99D74.3010409@few.vu.nl>

Dear all,

I've refrained from putting in my earlier comments some more discussion-level issues focusing on http://www.openannotation.org/spec/future/level1.html#BodyEmbed

"This model was chosen over having a literal as the Body directly for the following reasons:"
I'm sorry, but I still don't buy most of the reasons. And I believe I won't be the only one...
Going through individual points:

- "Literals have no identity, and thus cannot be referenced. If the Body was a literal, it would be impossible to refer to it directly and this is considered to be important in many use cases."
To me it's a positive point *no* to give identifiers to simple sting literals. What does it mean, when you give an identifier to a string like "interesting!" or "I should read this"? And if the same string is assigned different literals? To me when you have to refer to a string from different places (statements), it means that you have already more than a string - it becomes a kind of document.

- "It would be inconsistent with the rest of the model which allows any resource as a Body or Target, and thus would be a special case just for text in the Body."
This one is better. But it is mitigated by the fact that in RDF literals are in fact resources, too (http://www.w3.org/TR/rdf-schema/#ch_literal). There are reasons (related to reasoning or syntax for which properties with literals as objects are distinguished from properties with "fully-fledged" resources as objects. But they do not apply to all RDF-based models.

- "While literals can have their language and datatype associated with them in RDF, there are other aspects of text that are important for interpretation that cannot be associated with a literal. Examples include directionality of the text and encoding, plus of course metadata such as authorship, date of creation and so forth, which would not be possible."

This is very true - though it would help reader if you gave more info on what "directionality" means here.
But this argument is not against allowing literals as bodies. It just says that in some case, the bodies are sophisticated, document-like resources. Fair enough. But I will argue (and many others will) that many scenarios don't need this. And that it's not reasonable to impose on these latter scenarios the representation details that the former cases need. Caricaturing a bit, it looks as if we prevented string value attributes in object-oriented programming, on the basis that some texts deserve to be treated as objects.

Note that we faced a similar situation in SKOS, for documentation properties (http://www.w3.org/TR/skos-primer/#secadvanceddocumentation). And the decision we made then is that these properties can be used either with simple literals or more complex resources. See

- "If a server wished to extract the text and make it a resource with an HTTP URI, it would not be possible to assert equivalence or provenance."

I think it is the utter prerogative of annotation-producing applications, to decide whether the bodies they produce are worthy of specific provenance data or not. Is there a point in keeping track of whether someone "created" a string like 'I should read this' in the first place?
On the equivalence the argument is also not convincing: in fact literals come with equivalence conditions that are easy to get and already implemented. Trying to come with equivalence relationships between "resourcified literals" is much harder, both for spec designers or application builders (if we let them handle the issue). While working with the SKOS-XL extension have tried to open the can of worms of equivalence/identity conditions. We quite quickly postponed the issue (http://www.w3.org/TR/skos-reference/#L5739).

- "The cost of using ContentAsText is minimal: just one additional required triple over the literal case."

I quite agree with the principle, though this one additional triples means millions of additional ones -- I expect that cases of simple text annotation will be very very common.
But I don't buy it in a context in which it is recommended to type the resource with a dcmitypes class, to type it as cnt:ContentAsClass and to give its MIME type using dc:format. That's 4 triples, not 1. And many of them can be seen as of dubious added value (see earlier comments)

Note that the two SKOS patterns mentioned above (documentation properties and SKOS-XL) could be used in OA to have simple text bodies co-exist with more complex ones, either in relative isolation (SKOS documentation pattern) and with a tighter correspondence (SKOS-XL pattern allows to switch from one pattern to the other).
And I believe that relative isolation is not as bad as it looks. Applications who produce simple bodies can only be bothered by the perspective of having more complex data on these bodies. And applications requiring more complex data (say, provenance) would probably need some more complex procedure to generate it from the data produced by the simpler applications.

Best,

Antoine

Received on Sunday, 6 January 2013 15:51:45 UTC