Re: New Draft comments: textual bodies from Antoine Isaac on 2013-01-10 (public-openannotation@w3.org from January 2013)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Thu, 10 Jan 2013 23:01:23 +0100
To: <public-openannotation@w3.org>
Message-ID: <50EF3A33.1040205@few.vu.nl>
Hi,

I'm sorry to troll at such a level...

What Rob suggests goes clearly in the right direction.
Yet  again I have problems with some of the argumentation. I can understand Rob and Bob's "the cost of a single model of a blank node with cnt:chars is considered less for developers than having to check if the object of oa:hasBody is a literal".
But "The typing and assignment of a MIME type to the text string (especially text/plain vs text/html) is very important for clients" does not convince me. One could easily default to process plain literals as text/plain. And quite sneakily this raises the question of typed literals. Has nobody ever asked for using XML datatypes directly instead of resources that may be only awkwardly mapped to these datatypes?
(I ask the question out of the will to make the spec future-proof: personally I would be quite relieved to hear this had been considered earlier, but was discarded for some valid reason...)

Antoine


>
>
> To draw this to a close:
>
> * We will adopt the blank node with a single cnt:chars property method for when identity is considered unnecessary.
> * The rationale given in the document will be updated to include typing (as below), OWL-DL and integration, and to downplay the bullets that Antoine mentioned as less convincing in his original email.
> * The dctypes:Text additional class will be a MAY rather than a SHOULD
> * The assignment of cnt:ContentAsText will be a SHOULD rather than a MUST, as this is easily able to be inferred from the presence of cnt:chars. Note that cnt:ContentAsBase64 uses cnt:bytes, so the two are distinguishable.
>
> This seems to solve the number of triples and explicit identity costs, while still maintaining OWL-DL compatability and consistency in the model. The typing and assignment of a MIME type to the text string (especially text/plain vs text/html) is very important for clients to understand how to process the text content. Overall, the cost of a single model of a blank node with cnt:chars is considered less for developers than having to check if the object of oa:hasBody is a literal or a resource, and decide which of the two options to use.
>
> Rob
>
> On Wed, Jan 9, 2013 at 12:01 PM, Bob Morris <morris.bob@gmail.com <mailto:morris.bob@gmail.com>> wrote:
>
>     A few reasons we prefer OA artifacts, including bodies, as objects:
>
>     1. In our FilteredPush design we require semantic notification of
>     events of publication to the annotation store. It is probably
>     difficult to do this if we are not in control of the tractability of
>     our use sparql 1.1 SELECT queries that define the subscription
>     constraints. That is, we probably need to insure that tractability
>     issues arise from the domain vocabulary, not from the vocabularies we
>     don't control.
>
>     2. More generally, if bodies can be strings, there still remains a
>     need to type bodies, often with more than one type, in order that
>     consuming application code can switch on the type. In the community
>     for which we build systems, it is very hard to get an agreement on use
>     of small controlled vocabularies at all, and it is anyway a
>     red-herring for the type to based on the particular wording of a
>     textual body. Thus, while body typing is not a deep concern of an
>     annotation producing human, it is a deep concern of consuming, and
>     sometimes even producing, software.
>
>     3. Humans, other than developers, have little or no need to code the
>     structure of annotations by hand, so the added complexity of a
>     URI-based body should be of little concern. In human-facing
>     annotation producers or consumers, the details of the injection of
>     the text strings for the body content will rarely impact human authors
>     of annotation content. It only impacts the code developers, and things
>     like CNT are not very difficult to program against in any platform
>     where RDF can be handled in the first place. And as to developers,
>     well hey, Emacs makes a pretty good RDF/N3 editor. :-)
>
>     All that said, it certainly would serve to type oa:hasBody simply as
>     rdf:Property and somehow introduce an OA profile that requires
>     oa:hasBody be an object propertyt. But it would be a shame if
>     consuming software abounds that cannot even find and render strings
>     containing the textual body strings content just because they are one
>     layer further down in something as simple as CNT or some similar
>     structure.
>
>     --Bob Morris
>
>     Robert A. Morris
>
>     Emeritus Professor of Computer Science
>     UMASS-Boston
>     100 Morrissey Blvd
>     Boston, MA 02125-3390
>
>     IT Staff
>     Filtered Push Project
>     Harvard University Herbaria
>     Harvard University
>
>     email: morris.bob@gmail.com <mailto:morris.bob@gmail.com>
>     web: http://efg.cs.umb.edu/
>     web: http://filteredpush.org/mw/FilteredPush
>     http://www.cs.umb.edu/~ram <http://www.cs.umb.edu/%7Eram>
>     ===
>     The content of this communication is made entirely on my
>     own behalf and in no way should be deemed to express
>     official positions of The University of Massachusetts at Boston or
>     Harvard University.
>
>
>     On Wed, Jan 9, 2013 at 10:21 AM, Bernhard Haslhofer
>     <bernhard.haslhofer@cornell.edu <mailto:bernhard.haslhofer@cornell.edu>> wrote:
>      > Hi,
>      >
>      > I not yet finished with reviewing the entire draft, but I guess I can already comment on this. My short answer is: I fully agree with Antoine's perspective and prefer oa:hasBody to be typed as rdf:Property, which gives the flexibility to attach both literals and resources. I also expressed this concern before as part of the lessons we learned in our Maphub experiment.
>      >
>      > I think if a body looks like a string, can be encoded as a string and should be interpreted as a string, then it is probably a string and we should not force people to represent it differently. Also, BNodes and/or UUID nodes are non-trivial and to some extend controversial concepts and we should not force developers who want to represent really simple annotations in OA to get into this, if not required.
>      >
>      > One argument that prevented us from allowing string bodies in the past was that most annotations bodies won't be strings. However, in many OA prototypes I have seen so far, bodies ARE simple strings; also in our own cookbook examples almost half of the examples have bodies, which are also simple strings. So I think we should take this into account and reconsider this design decision.
>      >
>      > The other argument is OWL-DL compatibility; and yes, if we want to maintain this then we can either introduce an additional short-cut property, continue with the current solution, or take Jacco's suggestion. However, before we do this, I would ask how many use cases require OWL-DL compatibility. In Maphub we don't need it and again, it seems that the majority of annotation use cases does not rely on it. I certainly agree that some use cases will need it, but question if this should be the driving design motivation. I think at the end it comes down to the question, if OA should facilitate the construction of a formal annotation knowledge base or if OA should facilitate sharing of annotations (data) and linkage of Web resources. My personal preference is clearly the latter because I think it is more down-to-earth.
>      >
>      > Maybe it is also worth mentioning that many DBPedia properties are typed as rdf:Property. Here is an example: http://dbpedia.org/page/The_Shining_(film) <http://dbpedia.org/page/The_Shining_%28film%29> … the producer of a movie can be a resource (if more info is available) or simply a string (if only the name is known). I mentioning this, because people are using DBPedia data and it seems that people are generally happy with this design decision.
>      >
>      > If people think that OWL-DL compatibility is a fundamental design requirement for the OA model, then I like Jacco's suggestion. However, I also want to point that the people might run into practical problems when we allow bnodes. It is, for instance, hard to compute hashes over annotation representation containing nodes that don't have names. Existing RDF libraries, often assign internal ids to bnodes, but they change when an annotation moves from system A to system B.
>      >
>      > As a concrete step I propose:
>      > - to allow simple literals for oa:hasBody and
>      > - to change the type of oa:hasBody to rdf:Property and update the corresponding examples in the spec
>      > - provide guidelines in the appendix explaining how people who require OWL-DL compatibility could transform OA data into a logically consistent knowledge base.
>      >
>      > Best,
>      > Bernhard
>      >
>      > ------
>      > Bernhard Haslhofer
>      > Lecturer, Postdoctoral Associate
>      > Cornell University, Information Science
>      > 301 College Avenue
>      >
>      > bernhard.haslhofer@cornell.edu <mailto:bernhard.haslhofer@cornell.edu> (mailto:bernhard.haslhofer@cornell.edu <mailto:bernhard.haslhofer@cornell.edu>)
>      > http://www.cs.cornell.edu/~bh392 <http://www.cs.cornell.edu/%7Ebh392>
>      >
>      >
>      > On Sunday, January 6, 2013 at 10:51 AM, Antoine Isaac wrote:
>      >
>      >> Dear all,
>      >>
>      >> I've refrained from putting in my earlier comments some more discussion-level issues focusing on http://www.openannotation.org/spec/future/level1.html#BodyEmbed
>      >>
>      >> "This model was chosen over having a literal as the Body directly for the following reasons:"
>      >> I'm sorry, but I still don't buy most of the reasons. And I believe I won't be the only one...
>      >> Going through individual points:
>      >>
>      >>
>      >> - "Literals have no identity, and thus cannot be referenced. If the Body was a literal, it would be impossible to refer to it directly and this is considered to be important in many use cases."
>      >> To me it's a positive point *no* to give identifiers to simple sting literals. What does it mean, when you give an identifier to a string like "interesting!" or "I should read this"? And if the same string is assigned different literals? To me when you have to refer to a string from different places (statements), it means that you have already more than a string - it becomes a kind of document.
>      >>
>      >>
>      >> - "It would be inconsistent with the rest of the model which allows any resource as a Body or Target, and thus would be a special case just for text in the Body."
>      >> This one is better. But it is mitigated by the fact that in RDF literals are in fact resources, too (http://www.w3.org/TR/rdf-schema/#ch_literal). There are reasons (related to reasoning or syntax for which properties with literals as objects are distinguished from properties with "fully-fledged" resources as objects. But they do not apply to all RDF-based models.
>      >>
>      >>
>      >> - "While literals can have their language and datatype associated with them in RDF, there are other aspects of text that are important for interpretation that cannot be associated with a literal. Examples include directionality of the text and encoding, plus of course metadata such as authorship, date of creation and so forth, which would not be possible."
>      >>
>      >> This is very true - though it would help reader if you gave more info on what "directionality" means here.
>      >> But this argument is not against allowing literals as bodies. It just says that in some case, the bodies are sophisticated, document-like resources. Fair enough. But I will argue (and many others will) that many scenarios don't need this. And that it's not reasonable to impose on these latter scenarios the representation details that the former cases need. Caricaturing a bit, it looks as if we prevented string value attributes in object-oriented programming, on the basis that some texts deserve to be treated as objects.
>      >>
>      >> Note that we faced a similar situation in SKOS, for documentation properties (http://www.w3.org/TR/skos-primer/#secadvanceddocumentation). And the decision we made then is that these properties can be used either with simple literals or more complex resources. See
>      >>
>      >>
>      >> - "If a server wished to extract the text and make it a resource with an HTTP URI, it would not be possible to assert equivalence or provenance."
>      >>
>      >> I think it is the utter prerogative of annotation-producing applications, to decide whether the bodies they produce are worthy of specific provenance data or not. Is there a point in keeping track of whether someone "created" a string like 'I should read this' in the first place?
>      >> On the equivalence the argument is also not convincing: in fact literals come with equivalence conditions that are easy to get and already implemented. Trying to come with equivalence relationships between "resourcified literals" is much harder, both for spec designers or application builders (if we let them handle the issue). While working with the SKOS-XL extension have tried to open the can of worms of equivalence/identity conditions. We quite quickly postponed the issue (http://www.w3.org/TR/skos-reference/#L5739).
>      >>
>      >>
>      >> - "The cost of using ContentAsText is minimal: just one additional required triple over the literal case."
>      >>
>      >> I quite agree with the principle, though this one additional triples means millions of additional ones -- I expect that cases of simple text annotation will be very very common.
>      >> But I don't buy it in a context in which it is recommended to type the resource with a dcmitypes class, to type it as cnt:ContentAsClass and to give its MIME type using dc:format. That's 4 triples, not 1. And many of them can be seen as of dubious added value (see earlier comments)
>      >>
>      >>
>      >> Note that the two SKOS patterns mentioned above (documentation properties and SKOS-XL) could be used in OA to have simple text bodies co-exist with more complex ones, either in relative isolation (SKOS documentation pattern) and with a tighter correspondence (SKOS-XL pattern allows to switch from one pattern to the other).
>      >> And I believe that relative isolation is not as bad as it looks. Applications who produce simple bodies can only be bothered by the perspective of having more complex data on these bodies. And applications requiring more complex data (say, provenance) would probably need some more complex procedure to generate it from the data produced by the simpler applications.
>      >>
>      >>
>      >> Best,
>      >>
>      >> Antoine
>      >
>      >
>      >
>
>
>
>     --
>
>
Received on Thursday, 10 January 2013 22:01:53 UTC