W3C home > Mailing lists > Public > public-openannotation@w3.org > January 2013

Re: New Draft comments: textual bodies

From: Bob Morris <morris.bob@gmail.com>
Date: Wed, 9 Jan 2013 14:01:20 -0500
Message-ID: <CADUi7O6DStOqGDyKhZ9ZunEJ25npFeykzo4V7t0pDemweYYbGQ@mail.gmail.com>
To: Bernhard Haslhofer <bernhard.haslhofer@cornell.edu>
Cc: Antoine Isaac <aisaac@few.vu.nl>, public-openannotation <public-openannotation@w3.org>
A few reasons we prefer OA artifacts, including bodies, as objects:

1. In our FilteredPush design we require semantic notification of
events of publication to the annotation store.  It is probably
difficult to do this if we are not in control of the tractability of
our use  sparql 1.1 SELECT  queries that define the subscription
constraints. That is, we probably need to insure that tractability
issues arise from the domain vocabulary, not from the vocabularies we
don't control.

2. More generally, if bodies can be strings, there still remains a
need to type bodies, often with more than one type, in order that
consuming application code can switch on the type.  In the community
for which we build systems, it is very hard to get an agreement on use
of small controlled vocabularies at all, and it is anyway a
red-herring for the type to based on the particular wording of a
textual body. Thus, while body typing is not a deep concern of an
annotation producing human, it is a deep concern of consuming, and
sometimes even producing, software.

3. Humans, other than developers, have little or no need to code the
structure of annotations by hand, so the added complexity of a
URI-based body should be of little concern.  In human-facing
annotation producers or consumers, the details of the injection of
the text strings for the body content will rarely impact human authors
of annotation content. It only impacts the code developers, and things
like CNT are not very difficult to program against in any platform
where RDF can be handled in the first place. And as to developers,
well hey, Emacs makes a pretty good RDF/N3 editor.  :-)

All that said,  it certainly would serve to type oa:hasBody  simply as
rdf:Property and somehow introduce an OA profile that requires
oa:hasBody be an object propertyt.  But it would be a shame if
consuming software abounds that cannot even find and render strings
containing the textual body strings content just because they are one
layer further down in something as simple as CNT or some similar
structure.

--Bob Morris

Robert A. Morris

Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390

IT Staff
Filtered Push Project
Harvard University Herbaria
Harvard University

email: morris.bob@gmail.com
web: http://efg.cs.umb.edu/
web: http://filteredpush.org/mw/FilteredPush
http://www.cs.umb.edu/~ram
===
The content of this communication is made entirely on my
own behalf and in no way should be deemed to express
official positions of The University of Massachusetts at Boston or
Harvard University.


On Wed, Jan 9, 2013 at 10:21 AM, Bernhard Haslhofer
<bernhard.haslhofer@cornell.edu> wrote:
> Hi,
>
> I not yet finished with reviewing the entire draft, but I guess I can already comment on this. My short answer is: I fully agree with Antoine's perspective and prefer oa:hasBody to be typed as rdf:Property, which gives the flexibility to attach both literals and resources. I also expressed this concern before as part of the lessons we learned in our Maphub experiment.
>
> I think if a body looks like a string, can be encoded as a string and should be interpreted as a string, then it is probably a string and we should not force people to represent it differently. Also, BNodes and/or UUID nodes are non-trivial and to some extend controversial concepts and we should not force developers who want to represent really simple annotations in OA to get into this, if not required.
>
> One argument that prevented us from allowing string bodies in the past was that most annotations bodies won't be strings. However, in many OA prototypes I have seen so far, bodies ARE simple strings; also in our own cookbook examples almost half of the examples have bodies, which are also simple strings. So I think we should take this into account and reconsider this design decision.
>
> The other argument is OWL-DL compatibility; and yes, if we want to maintain this then we can either introduce an additional short-cut property, continue with the current solution, or take Jacco's suggestion. However, before we do this, I would ask how many use cases require OWL-DL compatibility. In Maphub we don't need it and again, it seems that the majority of annotation use cases does not rely on it. I certainly agree that some use cases will need it, but question if this should be the driving design motivation. I think at the end it comes down to the question, if OA should facilitate the construction of a formal annotation knowledge base or if OA should facilitate sharing of annotations (data) and linkage of Web resources. My personal preference is clearly the latter because I think it is more down-to-earth.
>
> Maybe it is also worth mentioning that many DBPedia properties are typed as rdf:Property. Here is an example: http://dbpedia.org/page/The_Shining_(film)  the producer of a movie can be a resource (if more info is available) or simply a string (if only the name is known). I mentioning this, because people are using DBPedia data and it seems that people are generally happy with this design decision.
>
> If people think that OWL-DL compatibility is a fundamental design requirement for the OA model, then I like Jacco's suggestion. However, I also want to point that the people might run into practical problems when we allow bnodes. It is, for instance, hard to compute hashes over annotation representation containing nodes that don't have names. Existing RDF libraries, often assign internal ids to bnodes, but they change when an annotation moves from system A to system B.
>
> As a concrete step I propose:
> - to allow simple literals for oa:hasBody and
> - to change the type of oa:hasBody to rdf:Property and update the corresponding examples in the spec
> - provide guidelines in the appendix explaining how people who require OWL-DL compatibility could transform OA data into a logically consistent knowledge base.
>
> Best,
> Bernhard
>
> ------
> Bernhard Haslhofer
> Lecturer, Postdoctoral Associate
> Cornell University, Information Science
> 301 College Avenue
>
> bernhard.haslhofer@cornell.edu (mailto:bernhard.haslhofer@cornell.edu)
> http://www.cs.cornell.edu/~bh392
>
>
> On Sunday, January 6, 2013 at 10:51 AM, Antoine Isaac wrote:
>
>> Dear all,
>>
>> I've refrained from putting in my earlier comments some more discussion-level issues focusing on http://www.openannotation.org/spec/future/level1.html#BodyEmbed
>>
>> "This model was chosen over having a literal as the Body directly for the following reasons:"
>> I'm sorry, but I still don't buy most of the reasons. And I believe I won't be the only one...
>> Going through individual points:
>>
>>
>> - "Literals have no identity, and thus cannot be referenced. If the Body was a literal, it would be impossible to refer to it directly and this is considered to be important in many use cases."
>> To me it's a positive point *no* to give identifiers to simple sting literals. What does it mean, when you give an identifier to a string like "interesting!" or "I should read this"? And if the same string is assigned different literals? To me when you have to refer to a string from different places (statements), it means that you have already more than a string - it becomes a kind of document.
>>
>>
>> - "It would be inconsistent with the rest of the model which allows any resource as a Body or Target, and thus would be a special case just for text in the Body."
>> This one is better. But it is mitigated by the fact that in RDF literals are in fact resources, too (http://www.w3.org/TR/rdf-schema/#ch_literal). There are reasons (related to reasoning or syntax for which properties with literals as objects are distinguished from properties with "fully-fledged" resources as objects. But they do not apply to all RDF-based models.
>>
>>
>> - "While literals can have their language and datatype associated with them in RDF, there are other aspects of text that are important for interpretation that cannot be associated with a literal. Examples include directionality of the text and encoding, plus of course metadata such as authorship, date of creation and so forth, which would not be possible."
>>
>> This is very true - though it would help reader if you gave more info on what "directionality" means here.
>> But this argument is not against allowing literals as bodies. It just says that in some case, the bodies are sophisticated, document-like resources. Fair enough. But I will argue (and many others will) that many scenarios don't need this. And that it's not reasonable to impose on these latter scenarios the representation details that the former cases need. Caricaturing a bit, it looks as if we prevented string value attributes in object-oriented programming, on the basis that some texts deserve to be treated as objects.
>>
>> Note that we faced a similar situation in SKOS, for documentation properties (http://www.w3.org/TR/skos-primer/#secadvanceddocumentation). And the decision we made then is that these properties can be used either with simple literals or more complex resources. See
>>
>>
>> - "If a server wished to extract the text and make it a resource with an HTTP URI, it would not be possible to assert equivalence or provenance."
>>
>> I think it is the utter prerogative of annotation-producing applications, to decide whether the bodies they produce are worthy of specific provenance data or not. Is there a point in keeping track of whether someone "created" a string like 'I should read this' in the first place?
>> On the equivalence the argument is also not convincing: in fact literals come with equivalence conditions that are easy to get and already implemented. Trying to come with equivalence relationships between "resourcified literals" is much harder, both for spec designers or application builders (if we let them handle the issue). While working with the SKOS-XL extension have tried to open the can of worms of equivalence/identity conditions. We quite quickly postponed the issue (http://www.w3.org/TR/skos-reference/#L5739).
>>
>>
>> - "The cost of using ContentAsText is minimal: just one additional required triple over the literal case."
>>
>> I quite agree with the principle, though this one additional triples means millions of additional ones -- I expect that cases of simple text annotation will be very very common.
>> But I don't buy it in a context in which it is recommended to type the resource with a dcmitypes class, to type it as cnt:ContentAsClass and to give its MIME type using dc:format. That's 4 triples, not 1. And many of them can be seen as of dubious added value (see earlier comments)
>>
>>
>> Note that the two SKOS patterns mentioned above (documentation properties and SKOS-XL) could be used in OA to have simple text bodies co-exist with more complex ones, either in relative isolation (SKOS documentation pattern) and with a tighter correspondence (SKOS-XL pattern allows to switch from one pattern to the other).
>> And I believe that relative isolation is not as bad as it looks. Applications who produce simple bodies can only be bothered by the perspective of having more complex data on these bodies. And applications requiring more complex data (say, provenance) would probably need some more complex procedure to generate it from the data produced by the simpler applications.
>>
>>
>> Best,
>>
>> Antoine
>
>
>



--
Received on Wednesday, 9 January 2013 19:01:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 January 2013 19:01:48 GMT