Re: New Draft comments: textual bodies from Bernhard Haslhofer on 2013-01-10 (public-openannotation@w3.org from January 2013)

From: Bernhard Haslhofer <bernhard.haslhofer@cornell.edu>
Date: Thu, 10 Jan 2013 10:08:05 -0500
To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Cc: public-openannotation <public-openannotation@w3.org>
Message-ID: <7424EDA4FEA047549D2934E231E54E28@gmail.com>
Hi,

> I like this conclusion. I am very uneasy with a property being both an
> Object Property and Data Property, that makes it very tricky in an OWL
> context and how to parse in implementations.
>  
Just out of curiosity: could you elaborate a bit on this? Why is tricky in an  
OWL context and why is it tricky to parse in implementations?
  
>  
> One of the problems with the old Dublin Core Elements was that it was
> blurry about what to expect in the range - so you could find both of
> these styles in the wild:
>  
> :x dc:creator "John Doe" .
> :y dc:creator <http://example.com/JohnDoe> .
>  
> And this made it very inconsistent across implementations. DC Terms
> has fixed this, at some points at cost of verbosity or need to find
> reference URIs. (dc:format "image/jpeg" vs dcterms:format mime:jpeg).
>  
As far as I know, dcterms:creator is of type rdf:Property but the guidelines
suggest to use it only with resources.  
>  
> Similarly I don't think that we should go "back in time" to encourage
> that kind of loose RDF.

Maybe…but sometimes it is worth to re-consider model design decisions after
observing some real-world patterns, like the use of plain strings.
>  
> We should not mandate much about exactly how it should be implemented
> with a bnode or UUID urns, perhaps just a leave a hint/good practice.
>  
> I understand one reason why we tried to recommend UUID URNs is to
> allow external meta-annotations, those get trickier for Bnodes
> (specially as hasBody is not functional).
>  
A major practical problem I experienced in the past was that it is really hard to compute hashes
over RDF serializations containing bnodes. I mean, you can compute them, but they are pretty much
useless because they are not comparable.

Anyway, I still think that simple annotations involving non-complex bodies should neither require bnodes
nor uuid-nodes.

Bernhard
>  
>  
> On Wed, Jan 9, 2013 at 7:15 PM, Robert Sanderson <azaroth42@gmail.com (mailto:azaroth42@gmail.com)> wrote:
> >  
> >  
> > To draw this to a close:
> >  
> > * We will adopt the blank node with a single cnt:chars property method for
> > when identity is considered unnecessary.
> > * The rationale given in the document will be updated to include typing (as
> > below), OWL-DL and integration, and to downplay the bullets that Antoine
> > mentioned as less convincing in his original email.
> > * The dctypes:Text additional class will be a MAY rather than a SHOULD
> > * The assignment of cnt:ContentAsText will be a SHOULD rather than a MUST,
> > as this is easily able to be inferred from the presence of cnt:chars. Note
> > that cnt:ContentAsBase64 uses cnt:bytes, so the two are distinguishable.
> >  
> > This seems to solve the number of triples and explicit identity costs, while
> > still maintaining OWL-DL compatability and consistency in the model. The
> > typing and assignment of a MIME type to the text string (especially
> > text/plain vs text/html) is very important for clients to understand how to
> > process the text content. Overall, the cost of a single model of a blank
> > node with cnt:chars is considered less for developers than having to check
> > if the object of oa:hasBody is a literal or a resource, and decide which of
> > the two options to use.
> >  
> > Rob
> >  
> >  
> > On Wed, Jan 9, 2013 at 12:01 PM, Bob Morris <morris.bob@gmail.com (mailto:morris.bob@gmail.com)> wrote:
> > >  
> > > A few reasons we prefer OA artifacts, including bodies, as objects:
> > >  
> > > 1. In our FilteredPush design we require semantic notification of
> > > events of publication to the annotation store. It is probably
> > > difficult to do this if we are not in control of the tractability of
> > > our use sparql 1.1 SELECT queries that define the subscription
> > > constraints. That is, we probably need to insure that tractability
> > > issues arise from the domain vocabulary, not from the vocabularies we
> > > don't control.
> > >  
> > > 2. More generally, if bodies can be strings, there still remains a
> > > need to type bodies, often with more than one type, in order that
> > > consuming application code can switch on the type. In the community
> > > for which we build systems, it is very hard to get an agreement on use
> > > of small controlled vocabularies at all, and it is anyway a
> > > red-herring for the type to based on the particular wording of a
> > > textual body. Thus, while body typing is not a deep concern of an
> > > annotation producing human, it is a deep concern of consuming, and
> > > sometimes even producing, software.
> > >  
> > > 3. Humans, other than developers, have little or no need to code the
> > > structure of annotations by hand, so the added complexity of a
> > > URI-based body should be of little concern. In human-facing
> > > annotation producers or consumers, the details of the injection of
> > > the text strings for the body content will rarely impact human authors
> > > of annotation content. It only impacts the code developers, and things
> > > like CNT are not very difficult to program against in any platform
> > > where RDF can be handled in the first place. And as to developers,
> > > well hey, Emacs makes a pretty good RDF/N3 editor. :-)
> > >  
> > > All that said, it certainly would serve to type oa:hasBody simply as
> > > rdf:Property and somehow introduce an OA profile that requires
> > > oa:hasBody be an object propertyt. But it would be a shame if
> > > consuming software abounds that cannot even find and render strings
> > > containing the textual body strings content just because they are one
> > > layer further down in something as simple as CNT or some similar
> > > structure.
> > >  
> > > --Bob Morris
> > >  
> > > Robert A. Morris
> > >  
> > > Emeritus Professor of Computer Science
> > > UMASS-Boston
> > > 100 Morrissey Blvd
> > > Boston, MA 02125-3390
> > >  
> > > IT Staff
> > > Filtered Push Project
> > > Harvard University Herbaria
> > > Harvard University
> > >  
> > > email: morris.bob@gmail.com (mailto:morris.bob@gmail.com)
> > > web: http://efg.cs.umb.edu/
> > > web: http://filteredpush.org/mw/FilteredPush
> > > http://www.cs.umb.edu/~ram
> > > ===
> > > The content of this communication is made entirely on my
> > > own behalf and in no way should be deemed to express
> > > official positions of The University of Massachusetts at Boston or
> > > Harvard University.
> > >  
> > >  
> > > On Wed, Jan 9, 2013 at 10:21 AM, Bernhard Haslhofer
> > > <bernhard.haslhofer@cornell.edu (mailto:bernhard.haslhofer@cornell.edu)> wrote:
> > > > Hi,
> > > >  
> > > > I not yet finished with reviewing the entire draft, but I guess I can
> > > > already comment on this. My short answer is: I fully agree with Antoine's
> > > > perspective and prefer oa:hasBody to be typed as rdf:Property, which gives
> > > > the flexibility to attach both literals and resources. I also expressed this
> > > > concern before as part of the lessons we learned in our Maphub experiment.
> > > >  
> > > > I think if a body looks like a string, can be encoded as a string and
> > > > should be interpreted as a string, then it is probably a string and we
> > > > should not force people to represent it differently. Also, BNodes and/or
> > > > UUID nodes are non-trivial and to some extend controversial concepts and we
> > > > should not force developers who want to represent really simple annotations
> > > > in OA to get into this, if not required.
> > > >  
> > > > One argument that prevented us from allowing string bodies in the past
> > > > was that most annotations bodies won't be strings. However, in many OA
> > > > prototypes I have seen so far, bodies ARE simple strings; also in our own
> > > > cookbook examples almost half of the examples have bodies, which are also
> > > > simple strings. So I think we should take this into account and reconsider
> > > > this design decision.
> > > >  
> > > > The other argument is OWL-DL compatibility; and yes, if we want to
> > > > maintain this then we can either introduce an additional short-cut property,
> > > > continue with the current solution, or take Jacco's suggestion. However,
> > > > before we do this, I would ask how many use cases require OWL-DL
> > > > compatibility. In Maphub we don't need it and again, it seems that the
> > > > majority of annotation use cases does not rely on it. I certainly agree that
> > > > some use cases will need it, but question if this should be the driving
> > > > design motivation. I think at the end it comes down to the question, if OA
> > > > should facilitate the construction of a formal annotation knowledge base or
> > > > if OA should facilitate sharing of annotations (data) and linkage of Web
> > > > resources. My personal preference is clearly the latter because I think it
> > > > is more down-to-earth.
> > > >  
> > > > Maybe it is also worth mentioning that many DBPedia properties are typed
> > > > as rdf:Property. Here is an example:
> > > > http://dbpedia.org/page/The_Shining_(film) … the producer of a movie can be
> > > > a resource (if more info is available) or simply a string (if only the name
> > > > is known). I mentioning this, because people are using DBPedia data and it
> > > > seems that people are generally happy with this design decision.
> > > >  
> > > > If people think that OWL-DL compatibility is a fundamental design
> > > > requirement for the OA model, then I like Jacco's suggestion. However, I
> > > > also want to point that the people might run into practical problems when we
> > > > allow bnodes. It is, for instance, hard to compute hashes over annotation
> > > > representation containing nodes that don't have names. Existing RDF
> > > > libraries, often assign internal ids to bnodes, but they change when an
> > > > annotation moves from system A to system B.
> > > >  
> > > > As a concrete step I propose:
> > > > - to allow simple literals for oa:hasBody and
> > > > - to change the type of oa:hasBody to rdf:Property and update the
> > > > corresponding examples in the spec
> > > > - provide guidelines in the appendix explaining how people who require
> > > > OWL-DL compatibility could transform OA data into a logically consistent
> > > > knowledge base.
> > > >  
> > > > Best,
> > > > Bernhard
> > > >  
> > > > ------
> > > > Bernhard Haslhofer
> > > > Lecturer, Postdoctoral Associate
> > > > Cornell University, Information Science
> > > > 301 College Avenue
> > > >  
> > > > bernhard.haslhofer@cornell.edu (mailto:bernhard.haslhofer@cornell.edu)
> > > > http://www.cs.cornell.edu/~bh392
> > > >  
> > > >  
> > > > On Sunday, January 6, 2013 at 10:51 AM, Antoine Isaac wrote:
> > > >  
> > > > > Dear all,
> > > > >  
> > > > > I've refrained from putting in my earlier comments some more
> > > > > discussion-level issues focusing on
> > > > > http://www.openannotation.org/spec/future/level1.html#BodyEmbed
> > > > >  
> > > > > "This model was chosen over having a literal as the Body directly for
> > > > > the following reasons:"
> > > > > I'm sorry, but I still don't buy most of the reasons. And I believe I
> > > > > won't be the only one...
> > > > > Going through individual points:
> > > > >  
> > > > >  
> > > > > - "Literals have no identity, and thus cannot be referenced. If the
> > > > > Body was a literal, it would be impossible to refer to it directly and this
> > > > > is considered to be important in many use cases."
> > > > > To me it's a positive point *no* to give identifiers to simple sting
> > > > > literals. What does it mean, when you give an identifier to a string like
> > > > > "interesting!" or "I should read this"? And if the same string is assigned
> > > > > different literals? To me when you have to refer to a string from different
> > > > > places (statements), it means that you have already more than a string - it
> > > > > becomes a kind of document.
> > > > >  
> > > > >  
> > > > > - "It would be inconsistent with the rest of the model which allows any
> > > > > resource as a Body or Target, and thus would be a special case just for text
> > > > > in the Body."
> > > > > This one is better. But it is mitigated by the fact that in RDF
> > > > > literals are in fact resources, too
> > > > > (http://www.w3.org/TR/rdf-schema/#ch_literal). There are reasons (related to
> > > > > reasoning or syntax for which properties with literals as objects are
> > > > > distinguished from properties with "fully-fledged" resources as objects. But
> > > > > they do not apply to all RDF-based models.
> > > > >  
> > > > >  
> > > > > - "While literals can have their language and datatype associated with
> > > > > them in RDF, there are other aspects of text that are important for
> > > > > interpretation that cannot be associated with a literal. Examples include
> > > > > directionality of the text and encoding, plus of course metadata such as
> > > > > authorship, date of creation and so forth, which would not be possible."
> > > > >  
> > > > > This is very true - though it would help reader if you gave more info
> > > > > on what "directionality" means here.
> > > > > But this argument is not against allowing literals as bodies. It just
> > > > > says that in some case, the bodies are sophisticated, document-like
> > > > > resources. Fair enough. But I will argue (and many others will) that many
> > > > > scenarios don't need this. And that it's not reasonable to impose on these
> > > > > latter scenarios the representation details that the former cases need.
> > > > > Caricaturing a bit, it looks as if we prevented string value attributes in
> > > > > object-oriented programming, on the basis that some texts deserve to be
> > > > > treated as objects.
> > > > >  
> > > > > Note that we faced a similar situation in SKOS, for documentation
> > > > > properties (http://www.w3.org/TR/skos-primer/#secadvanceddocumentation). And
> > > > > the decision we made then is that these properties can be used either with
> > > > > simple literals or more complex resources. See
> > > > >  
> > > > >  
> > > > > - "If a server wished to extract the text and make it a resource with
> > > > > an HTTP URI, it would not be possible to assert equivalence or provenance."
> > > > >  
> > > > > I think it is the utter prerogative of annotation-producing
> > > > > applications, to decide whether the bodies they produce are worthy of
> > > > > specific provenance data or not. Is there a point in keeping track of
> > > > > whether someone "created" a string like 'I should read this' in the first
> > > > > place?
> > > > > On the equivalence the argument is also not convincing: in fact
> > > > > literals come with equivalence conditions that are easy to get and already
> > > > > implemented. Trying to come with equivalence relationships between
> > > > > "resourcified literals" is much harder, both for spec designers or
> > > > > application builders (if we let them handle the issue). While working with
> > > > > the SKOS-XL extension have tried to open the can of worms of
> > > > > equivalence/identity conditions. We quite quickly postponed the issue
> > > > > (http://www.w3.org/TR/skos-reference/#L5739).
> > > > >  
> > > > >  
> > > > > - "The cost of using ContentAsText is minimal: just one additional
> > > > > required triple over the literal case."
> > > > >  
> > > > > I quite agree with the principle, though this one additional triples
> > > > > means millions of additional ones -- I expect that cases of simple text
> > > > > annotation will be very very common.
> > > > > But I don't buy it in a context in which it is recommended to type the
> > > > > resource with a dcmitypes class, to type it as cnt:ContentAsClass and to
> > > > > give its MIME type using dc:format. That's 4 triples, not 1. And many of
> > > > > them can be seen as of dubious added value (see earlier comments)
> > > > >  
> > > > >  
> > > > > Note that the two SKOS patterns mentioned above (documentation
> > > > > properties and SKOS-XL) could be used in OA to have simple text bodies
> > > > > co-exist with more complex ones, either in relative isolation (SKOS
> > > > > documentation pattern) and with a tighter correspondence (SKOS-XL pattern
> > > > > allows to switch from one pattern to the other).
> > > > > And I believe that relative isolation is not as bad as it looks.
> > > > > Applications who produce simple bodies can only be bothered by the
> > > > > perspective of having more complex data on these bodies. And applications
> > > > > requiring more complex data (say, provenance) would probably need some more
> > > > > complex procedure to generate it from the data produced by the simpler
> > > > > applications.
> > > > >  
> > > > >  
> > > > > Best,
> > > > >  
> > > > > Antoine
> > >  
> > >  
> > >  
> > > --
>  
>  
>  
> --  
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester
Received on Thursday, 10 January 2013 15:08:40 UTC