Re: New Draft comments: textual bodies from Robert Sanderson on 2013-01-09 (public-openannotation@w3.org from January 2013)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Wed, 9 Jan 2013 12:15:19 -0700
To: Bob Morris <morris.bob@gmail.com>
Cc: public-openannotation <public-openannotation@w3.org>
Message-ID: <CABevsUGPtrhx2zwhGbJfr1ESQ-zCn9+3oYmukEChV8quckiu_g@mail.gmail.com>
To draw this to a close:

* We will adopt the blank node with a single cnt:chars property method for
when identity is considered unnecessary.
* The rationale given in the document will be updated to include typing (as
below), OWL-DL and integration, and to downplay the bullets that Antoine
mentioned as less convincing in his original email.
* The dctypes:Text additional class will be a MAY rather than a SHOULD
* The assignment of cnt:ContentAsText will be a SHOULD rather than a MUST,
as this is easily able to be inferred from the presence of cnt:chars.  Note
that cnt:ContentAsBase64 uses cnt:bytes, so the two are distinguishable.

This seems to solve the number of triples and explicit identity costs,
while still maintaining OWL-DL compatability and consistency in the model.
The typing and assignment of a MIME type to the text string (especially
text/plain vs text/html) is very important for clients to understand how to
process the text content. Overall, the cost of a single model of a blank
node with cnt:chars is considered less for developers than having to check
if the object of oa:hasBody is a literal or a resource, and decide which of
the two options to use.

Rob

On Wed, Jan 9, 2013 at 12:01 PM, Bob Morris <morris.bob@gmail.com> wrote:

> A few reasons we prefer OA artifacts, including bodies, as objects:
>
> 1. In our FilteredPush design we require semantic notification of
> events of publication to the annotation store.  It is probably
> difficult to do this if we are not in control of the tractability of
> our use  sparql 1.1 SELECT  queries that define the subscription
> constraints. That is, we probably need to insure that tractability
> issues arise from the domain vocabulary, not from the vocabularies we
> don't control.
>
> 2. More generally, if bodies can be strings, there still remains a
> need to type bodies, often with more than one type, in order that
> consuming application code can switch on the type.  In the community
> for which we build systems, it is very hard to get an agreement on use
> of small controlled vocabularies at all, and it is anyway a
> red-herring for the type to based on the particular wording of a
> textual body. Thus, while body typing is not a deep concern of an
> annotation producing human, it is a deep concern of consuming, and
> sometimes even producing, software.
>
> 3. Humans, other than developers, have little or no need to code the
> structure of annotations by hand, so the added complexity of a
> URI-based body should be of little concern.  In human-facing
> annotation producers or consumers, the details of the injection of
> the text strings for the body content will rarely impact human authors
> of annotation content. It only impacts the code developers, and things
> like CNT are not very difficult to program against in any platform
> where RDF can be handled in the first place. And as to developers,
> well hey, Emacs makes a pretty good RDF/N3 editor.  :-)
>
> All that said,  it certainly would serve to type oa:hasBody  simply as
> rdf:Property and somehow introduce an OA profile that requires
> oa:hasBody be an object propertyt.  But it would be a shame if
> consuming software abounds that cannot even find and render strings
> containing the textual body strings content just because they are one
> layer further down in something as simple as CNT or some similar
> structure.
>
> --Bob Morris
>
> Robert A. Morris
>
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
>
> IT Staff
> Filtered Push Project
> Harvard University Herbaria
> Harvard University
>
> email: morris.bob@gmail.com
> web: http://efg.cs.umb.edu/
> web: http://filteredpush.org/mw/FilteredPush
> http://www.cs.umb.edu/~ram
> ===
> The content of this communication is made entirely on my
> own behalf and in no way should be deemed to express
> official positions of The University of Massachusetts at Boston or
> Harvard University.
>
>
> On Wed, Jan 9, 2013 at 10:21 AM, Bernhard Haslhofer
> <bernhard.haslhofer@cornell.edu> wrote:
> > Hi,
> >
> > I not yet finished with reviewing the entire draft, but I guess I can
> already comment on this. My short answer is: I fully agree with Antoine's
> perspective and prefer oa:hasBody to be typed as rdf:Property, which gives
> the flexibility to attach both literals and resources. I also expressed
> this concern before as part of the lessons we learned in our Maphub
> experiment.
> >
> > I think if a body looks like a string, can be encoded as a string and
> should be interpreted as a string, then it is probably a string and we
> should not force people to represent it differently. Also, BNodes and/or
> UUID nodes are non-trivial and to some extend controversial concepts and we
> should not force developers who want to represent really simple annotations
> in OA to get into this, if not required.
> >
> > One argument that prevented us from allowing string bodies in the past
> was that most annotations bodies won't be strings. However, in many OA
> prototypes I have seen so far, bodies ARE simple strings; also in our own
> cookbook examples almost half of the examples have bodies, which are also
> simple strings. So I think we should take this into account and reconsider
> this design decision.
> >
> > The other argument is OWL-DL compatibility; and yes, if we want to
> maintain this then we can either introduce an additional short-cut
> property, continue with the current solution, or take Jacco's suggestion.
> However, before we do this, I would ask how many use cases require OWL-DL
> compatibility. In Maphub we don't need it and again, it seems that the
> majority of annotation use cases does not rely on it. I certainly agree
> that some use cases will need it, but question if this should be the
> driving design motivation. I think at the end it comes down to the
> question, if OA should facilitate the construction of a formal annotation
> knowledge base or if OA should facilitate sharing of annotations (data) and
> linkage of Web resources. My personal preference is clearly the latter
> because I think it is more down-to-earth.
> >
> > Maybe it is also worth mentioning that many DBPedia properties are typed
> as rdf:Property. Here is an example:
> http://dbpedia.org/page/The_Shining_(film) … the producer of a movie can
> be a resource (if more info is available) or simply a string (if only the
> name is known). I mentioning this, because people are using DBPedia data
> and it seems that people are generally happy with this design decision.
> >
> > If people think that OWL-DL compatibility is a fundamental design
> requirement for the OA model, then I like Jacco's suggestion. However, I
> also want to point that the people might run into practical problems when
> we allow bnodes. It is, for instance, hard to compute hashes over
> annotation representation containing nodes that don't have names. Existing
> RDF libraries, often assign internal ids to bnodes, but they change when an
> annotation moves from system A to system B.
> >
> > As a concrete step I propose:
> > - to allow simple literals for oa:hasBody and
> > - to change the type of oa:hasBody to rdf:Property and update the
> corresponding examples in the spec
> > - provide guidelines in the appendix explaining how people who require
> OWL-DL compatibility could transform OA data into a logically consistent
> knowledge base.
> >
> > Best,
> > Bernhard
> >
> > ------
> > Bernhard Haslhofer
> > Lecturer, Postdoctoral Associate
> > Cornell University, Information Science
> > 301 College Avenue
> >
> > bernhard.haslhofer@cornell.edu (mailto:bernhard.haslhofer@cornell.edu)
> > http://www.cs.cornell.edu/~bh392
> >
> >
> > On Sunday, January 6, 2013 at 10:51 AM, Antoine Isaac wrote:
> >
> >> Dear all,
> >>
> >> I've refrained from putting in my earlier comments some more
> discussion-level issues focusing on
> http://www.openannotation.org/spec/future/level1.html#BodyEmbed
> >>
> >> "This model was chosen over having a literal as the Body directly for
> the following reasons:"
> >> I'm sorry, but I still don't buy most of the reasons. And I believe I
> won't be the only one...
> >> Going through individual points:
> >>
> >>
> >> - "Literals have no identity, and thus cannot be referenced. If the
> Body was a literal, it would be impossible to refer to it directly and this
> is considered to be important in many use cases."
> >> To me it's a positive point *no* to give identifiers to simple sting
> literals. What does it mean, when you give an identifier to a string like
> "interesting!" or "I should read this"? And if the same string is assigned
> different literals? To me when you have to refer to a string from different
> places (statements), it means that you have already more than a string - it
> becomes a kind of document.
> >>
> >>
> >> - "It would be inconsistent with the rest of the model which allows any
> resource as a Body or Target, and thus would be a special case just for
> text in the Body."
> >> This one is better. But it is mitigated by the fact that in RDF
> literals are in fact resources, too (
> http://www.w3.org/TR/rdf-schema/#ch_literal). There are reasons (related
> to reasoning or syntax for which properties with literals as objects are
> distinguished from properties with "fully-fledged" resources as objects.
> But they do not apply to all RDF-based models.
> >>
> >>
> >> - "While literals can have their language and datatype associated with
> them in RDF, there are other aspects of text that are important for
> interpretation that cannot be associated with a literal. Examples include
> directionality of the text and encoding, plus of course metadata such as
> authorship, date of creation and so forth, which would not be possible."
> >>
> >> This is very true - though it would help reader if you gave more info
> on what "directionality" means here.
> >> But this argument is not against allowing literals as bodies. It just
> says that in some case, the bodies are sophisticated, document-like
> resources. Fair enough. But I will argue (and many others will) that many
> scenarios don't need this. And that it's not reasonable to impose on these
> latter scenarios the representation details that the former cases need.
> Caricaturing a bit, it looks as if we prevented string value attributes in
> object-oriented programming, on the basis that some texts deserve to be
> treated as objects.
> >>
> >> Note that we faced a similar situation in SKOS, for documentation
> properties (http://www.w3.org/TR/skos-primer/#secadvanceddocumentation).
> And the decision we made then is that these properties can be used either
> with simple literals or more complex resources. See
> >>
> >>
> >> - "If a server wished to extract the text and make it a resource with
> an HTTP URI, it would not be possible to assert equivalence or provenance."
> >>
> >> I think it is the utter prerogative of annotation-producing
> applications, to decide whether the bodies they produce are worthy of
> specific provenance data or not. Is there a point in keeping track of
> whether someone "created" a string like 'I should read this' in the first
> place?
> >> On the equivalence the argument is also not convincing: in fact
> literals come with equivalence conditions that are easy to get and already
> implemented. Trying to come with equivalence relationships between
> "resourcified literals" is much harder, both for spec designers or
> application builders (if we let them handle the issue). While working with
> the SKOS-XL extension have tried to open the can of worms of
> equivalence/identity conditions. We quite quickly postponed the issue (
> http://www.w3.org/TR/skos-reference/#L5739).
> >>
> >>
> >> - "The cost of using ContentAsText is minimal: just one additional
> required triple over the literal case."
> >>
> >> I quite agree with the principle, though this one additional triples
> means millions of additional ones -- I expect that cases of simple text
> annotation will be very very common.
> >> But I don't buy it in a context in which it is recommended to type the
> resource with a dcmitypes class, to type it as cnt:ContentAsClass and to
> give its MIME type using dc:format. That's 4 triples, not 1. And many of
> them can be seen as of dubious added value (see earlier comments)
> >>
> >>
> >> Note that the two SKOS patterns mentioned above (documentation
> properties and SKOS-XL) could be used in OA to have simple text bodies
> co-exist with more complex ones, either in relative isolation (SKOS
> documentation pattern) and with a tighter correspondence (SKOS-XL pattern
> allows to switch from one pattern to the other).
> >> And I believe that relative isolation is not as bad as it looks.
> Applications who produce simple bodies can only be bothered by the
> perspective of having more complex data on these bodies. And applications
> requiring more complex data (say, provenance) would probably need some more
> complex procedure to generate it from the data produced by the simpler
> applications.
> >>
> >>
> >> Best,
> >>
> >> Antoine
> >
> >
> >
>
>
>
> --
>
>
Received on Wednesday, 9 January 2013 19:15:48 UTC