Re: New Draft comments: textual bodies from Stian Soiland-Reyes on 2013-01-10 (public-openannotation@w3.org from January 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Thu, 10 Jan 2013 12:06:17 +0000
To: Robert Sanderson <azaroth42@gmail.com>
Cc: Bob Morris <morris.bob@gmail.com>, public-openannotation <public-openannotation@w3.org>
Message-ID: <CAPRnXtk_MKL2xFYmPwkeu0wZQ=4ehzZojrRB-bwpSSdJ5mU3Ug@mail.gmail.com>
I like this conclusion. I am very uneasy with a property being both an
Object Property and Data Property, that makes it very tricky in an OWL
context and how to parse in implementations.

One of the problems with the old Dublin Core Elements was that it was
blurry about what to expect in the range - so you could find both of
these styles in the wild:

:x  dc:creator "John Doe" .
:y  dc:creator <http://example.com/JohnDoe> .

And this made it very inconsistent across implementations. DC Terms
has fixed this, at some points at cost of verbosity or need to find
reference URIs.  (dc:format "image/jpeg" vs dcterms:format mime:jpeg).

Similarly I don't think that we should go "back in time" to encourage
that kind of loose RDF.

We should not mandate much about exactly how it should be implemented
with a bnode or UUID urns, perhaps just a leave a hint/good practice.

I understand one reason why we tried to recommend UUID URNs is to
allow external meta-annotations, those get trickier for Bnodes
(specially as hasBody is not functional).


On Wed, Jan 9, 2013 at 7:15 PM, Robert Sanderson <azaroth42@gmail.com> wrote:
>
>
> To draw this to a close:
>
> * We will adopt the blank node with a single cnt:chars property method for
> when identity is considered unnecessary.
> * The rationale given in the document will be updated to include typing (as
> below), OWL-DL and integration, and to downplay the bullets that Antoine
> mentioned as less convincing in his original email.
> * The dctypes:Text additional class will be a MAY rather than a SHOULD
> * The assignment of cnt:ContentAsText will be a SHOULD rather than a MUST,
> as this is easily able to be inferred from the presence of cnt:chars.  Note
> that cnt:ContentAsBase64 uses cnt:bytes, so the two are distinguishable.
>
> This seems to solve the number of triples and explicit identity costs, while
> still maintaining OWL-DL compatability and consistency in the model.  The
> typing and assignment of a MIME type to the text string (especially
> text/plain vs text/html) is very important for clients to understand how to
> process the text content. Overall, the cost of a single model of a blank
> node with cnt:chars is considered less for developers than having to check
> if the object of oa:hasBody is a literal or a resource, and decide which of
> the two options to use.
>
> Rob
>
>
> On Wed, Jan 9, 2013 at 12:01 PM, Bob Morris <morris.bob@gmail.com> wrote:
>>
>> A few reasons we prefer OA artifacts, including bodies, as objects:
>>
>> 1. In our FilteredPush design we require semantic notification of
>> events of publication to the annotation store.  It is probably
>> difficult to do this if we are not in control of the tractability of
>> our use  sparql 1.1 SELECT  queries that define the subscription
>> constraints. That is, we probably need to insure that tractability
>> issues arise from the domain vocabulary, not from the vocabularies we
>> don't control.
>>
>> 2. More generally, if bodies can be strings, there still remains a
>> need to type bodies, often with more than one type, in order that
>> consuming application code can switch on the type.  In the community
>> for which we build systems, it is very hard to get an agreement on use
>> of small controlled vocabularies at all, and it is anyway a
>> red-herring for the type to based on the particular wording of a
>> textual body. Thus, while body typing is not a deep concern of an
>> annotation producing human, it is a deep concern of consuming, and
>> sometimes even producing, software.
>>
>> 3. Humans, other than developers, have little or no need to code the
>> structure of annotations by hand, so the added complexity of a
>> URI-based body should be of little concern.  In human-facing
>> annotation producers or consumers, the details of the injection of
>> the text strings for the body content will rarely impact human authors
>> of annotation content. It only impacts the code developers, and things
>> like CNT are not very difficult to program against in any platform
>> where RDF can be handled in the first place. And as to developers,
>> well hey, Emacs makes a pretty good RDF/N3 editor.  :-)
>>
>> All that said,  it certainly would serve to type oa:hasBody  simply as
>> rdf:Property and somehow introduce an OA profile that requires
>> oa:hasBody be an object propertyt.  But it would be a shame if
>> consuming software abounds that cannot even find and render strings
>> containing the textual body strings content just because they are one
>> layer further down in something as simple as CNT or some similar
>> structure.
>>
>> --Bob Morris
>>
>> Robert A. Morris
>>
>> Emeritus Professor  of Computer Science
>> UMASS-Boston
>> 100 Morrissey Blvd
>> Boston, MA 02125-3390
>>
>> IT Staff
>> Filtered Push Project
>> Harvard University Herbaria
>> Harvard University
>>
>> email: morris.bob@gmail.com
>> web: http://efg.cs.umb.edu/
>> web: http://filteredpush.org/mw/FilteredPush
>> http://www.cs.umb.edu/~ram
>> ===
>> The content of this communication is made entirely on my
>> own behalf and in no way should be deemed to express
>> official positions of The University of Massachusetts at Boston or
>> Harvard University.
>>
>>
>> On Wed, Jan 9, 2013 at 10:21 AM, Bernhard Haslhofer
>> <bernhard.haslhofer@cornell.edu> wrote:
>> > Hi,
>> >
>> > I not yet finished with reviewing the entire draft, but I guess I can
>> > already comment on this. My short answer is: I fully agree with Antoine's
>> > perspective and prefer oa:hasBody to be typed as rdf:Property, which gives
>> > the flexibility to attach both literals and resources. I also expressed this
>> > concern before as part of the lessons we learned in our Maphub experiment.
>> >
>> > I think if a body looks like a string, can be encoded as a string and
>> > should be interpreted as a string, then it is probably a string and we
>> > should not force people to represent it differently. Also, BNodes and/or
>> > UUID nodes are non-trivial and to some extend controversial concepts and we
>> > should not force developers who want to represent really simple annotations
>> > in OA to get into this, if not required.
>> >
>> > One argument that prevented us from allowing string bodies in the past
>> > was that most annotations bodies won't be strings. However, in many OA
>> > prototypes I have seen so far, bodies ARE simple strings; also in our own
>> > cookbook examples almost half of the examples have bodies, which are also
>> > simple strings. So I think we should take this into account and reconsider
>> > this design decision.
>> >
>> > The other argument is OWL-DL compatibility; and yes, if we want to
>> > maintain this then we can either introduce an additional short-cut property,
>> > continue with the current solution, or take Jacco's suggestion. However,
>> > before we do this, I would ask how many use cases require OWL-DL
>> > compatibility. In Maphub we don't need it and again, it seems that the
>> > majority of annotation use cases does not rely on it. I certainly agree that
>> > some use cases will need it, but question if this should be the driving
>> > design motivation. I think at the end it comes down to the question, if OA
>> > should facilitate the construction of a formal annotation knowledge base or
>> > if OA should facilitate sharing of annotations (data) and linkage of Web
>> > resources. My personal preference is clearly the latter because I think it
>> > is more down-to-earth.
>> >
>> > Maybe it is also worth mentioning that many DBPedia properties are typed
>> > as rdf:Property. Here is an example:
>> > http://dbpedia.org/page/The_Shining_(film) … the producer of a movie can be
>> > a resource (if more info is available) or simply a string (if only the name
>> > is known). I mentioning this, because people are using DBPedia data and it
>> > seems that people are generally happy with this design decision.
>> >
>> > If people think that OWL-DL compatibility is a fundamental design
>> > requirement for the OA model, then I like Jacco's suggestion. However, I
>> > also want to point that the people might run into practical problems when we
>> > allow bnodes. It is, for instance, hard to compute hashes over annotation
>> > representation containing nodes that don't have names. Existing RDF
>> > libraries, often assign internal ids to bnodes, but they change when an
>> > annotation moves from system A to system B.
>> >
>> > As a concrete step I propose:
>> > - to allow simple literals for oa:hasBody and
>> > - to change the type of oa:hasBody to rdf:Property and update the
>> > corresponding examples in the spec
>> > - provide guidelines in the appendix explaining how people who require
>> > OWL-DL compatibility could transform OA data into a logically consistent
>> > knowledge base.
>> >
>> > Best,
>> > Bernhard
>> >
>> > ------
>> > Bernhard Haslhofer
>> > Lecturer, Postdoctoral Associate
>> > Cornell University, Information Science
>> > 301 College Avenue
>> >
>> > bernhard.haslhofer@cornell.edu (mailto:bernhard.haslhofer@cornell.edu)
>> > http://www.cs.cornell.edu/~bh392
>> >
>> >
>> > On Sunday, January 6, 2013 at 10:51 AM, Antoine Isaac wrote:
>> >
>> >> Dear all,
>> >>
>> >> I've refrained from putting in my earlier comments some more
>> >> discussion-level issues focusing on
>> >> http://www.openannotation.org/spec/future/level1.html#BodyEmbed
>> >>
>> >> "This model was chosen over having a literal as the Body directly for
>> >> the following reasons:"
>> >> I'm sorry, but I still don't buy most of the reasons. And I believe I
>> >> won't be the only one...
>> >> Going through individual points:
>> >>
>> >>
>> >> - "Literals have no identity, and thus cannot be referenced. If the
>> >> Body was a literal, it would be impossible to refer to it directly and this
>> >> is considered to be important in many use cases."
>> >> To me it's a positive point *no* to give identifiers to simple sting
>> >> literals. What does it mean, when you give an identifier to a string like
>> >> "interesting!" or "I should read this"? And if the same string is assigned
>> >> different literals? To me when you have to refer to a string from different
>> >> places (statements), it means that you have already more than a string - it
>> >> becomes a kind of document.
>> >>
>> >>
>> >> - "It would be inconsistent with the rest of the model which allows any
>> >> resource as a Body or Target, and thus would be a special case just for text
>> >> in the Body."
>> >> This one is better. But it is mitigated by the fact that in RDF
>> >> literals are in fact resources, too
>> >> (http://www.w3.org/TR/rdf-schema/#ch_literal). There are reasons (related to
>> >> reasoning or syntax for which properties with literals as objects are
>> >> distinguished from properties with "fully-fledged" resources as objects. But
>> >> they do not apply to all RDF-based models.
>> >>
>> >>
>> >> - "While literals can have their language and datatype associated with
>> >> them in RDF, there are other aspects of text that are important for
>> >> interpretation that cannot be associated with a literal. Examples include
>> >> directionality of the text and encoding, plus of course metadata such as
>> >> authorship, date of creation and so forth, which would not be possible."
>> >>
>> >> This is very true - though it would help reader if you gave more info
>> >> on what "directionality" means here.
>> >> But this argument is not against allowing literals as bodies. It just
>> >> says that in some case, the bodies are sophisticated, document-like
>> >> resources. Fair enough. But I will argue (and many others will) that many
>> >> scenarios don't need this. And that it's not reasonable to impose on these
>> >> latter scenarios the representation details that the former cases need.
>> >> Caricaturing a bit, it looks as if we prevented string value attributes in
>> >> object-oriented programming, on the basis that some texts deserve to be
>> >> treated as objects.
>> >>
>> >> Note that we faced a similar situation in SKOS, for documentation
>> >> properties (http://www.w3.org/TR/skos-primer/#secadvanceddocumentation). And
>> >> the decision we made then is that these properties can be used either with
>> >> simple literals or more complex resources. See
>> >>
>> >>
>> >> - "If a server wished to extract the text and make it a resource with
>> >> an HTTP URI, it would not be possible to assert equivalence or provenance."
>> >>
>> >> I think it is the utter prerogative of annotation-producing
>> >> applications, to decide whether the bodies they produce are worthy of
>> >> specific provenance data or not. Is there a point in keeping track of
>> >> whether someone "created" a string like 'I should read this' in the first
>> >> place?
>> >> On the equivalence the argument is also not convincing: in fact
>> >> literals come with equivalence conditions that are easy to get and already
>> >> implemented. Trying to come with equivalence relationships between
>> >> "resourcified literals" is much harder, both for spec designers or
>> >> application builders (if we let them handle the issue). While working with
>> >> the SKOS-XL extension have tried to open the can of worms of
>> >> equivalence/identity conditions. We quite quickly postponed the issue
>> >> (http://www.w3.org/TR/skos-reference/#L5739).
>> >>
>> >>
>> >> - "The cost of using ContentAsText is minimal: just one additional
>> >> required triple over the literal case."
>> >>
>> >> I quite agree with the principle, though this one additional triples
>> >> means millions of additional ones -- I expect that cases of simple text
>> >> annotation will be very very common.
>> >> But I don't buy it in a context in which it is recommended to type the
>> >> resource with a dcmitypes class, to type it as cnt:ContentAsClass and to
>> >> give its MIME type using dc:format. That's 4 triples, not 1. And many of
>> >> them can be seen as of dubious added value (see earlier comments)
>> >>
>> >>
>> >> Note that the two SKOS patterns mentioned above (documentation
>> >> properties and SKOS-XL) could be used in OA to have simple text bodies
>> >> co-exist with more complex ones, either in relative isolation (SKOS
>> >> documentation pattern) and with a tighter correspondence (SKOS-XL pattern
>> >> allows to switch from one pattern to the other).
>> >> And I believe that relative isolation is not as bad as it looks.
>> >> Applications who produce simple bodies can only be bothered by the
>> >> perspective of having more complex data on these bodies. And applications
>> >> requiring more complex data (say, provenance) would probably need some more
>> >> complex procedure to generate it from the data produced by the simpler
>> >> applications.
>> >>
>> >>
>> >> Best,
>> >>
>> >> Antoine
>> >
>> >
>> >
>>
>>
>>
>> --
>>
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Thursday, 10 January 2013 12:07:05 UTC