Re: New Draft comments: textual bodies from Antoine Isaac on 2013-01-10 (public-openannotation@w3.org from January 2013)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Thu, 10 Jan 2013 22:45:28 +0100
To: <public-openannotation@w3.org>
Message-ID: <50EF3678.5060807@few.vu.nl>
Hi Stian,

For info the old Dublin Core is still not dead. In the Europeana project (dealing with cultural heritage data) we use it, precisely because it allows flexibility: many data providers just want to provide what they can.
And as a matter of fact we use the old DC vocabulary because we're respectful of specifications: in our domain many many many who use properties like dcterms:creator use them with simple literals.
The more elaborate pattern (with resource) serves rather as a guideline for encouraging these who are willing to put more resource in producing richer data. We would never think of reading it as a hard constraint! For some domains the barrier to implementation have to be seriously lowered.

For info schema.org has the same kind of approach. Some properties advertise classes resources as range; but the guidelines explicitly say that they can be used with simple literals instead.

Antoine


> I like this conclusion. I am very uneasy with a property being both an
> Object Property and Data Property, that makes it very tricky in an OWL
> context and how to parse in implementations.
>
> One of the problems with the old Dublin Core Elements was that it was
> blurry about what to expect in the range - so you could find both of
> these styles in the wild:
>
> :x  dc:creator "John Doe" .
> :y  dc:creator<http://example.com/JohnDoe>  .
>
> And this made it very inconsistent across implementations. DC Terms
> has fixed this, at some points at cost of verbosity or need to find
> reference URIs.  (dc:format "image/jpeg" vs dcterms:format mime:jpeg).
>
> Similarly I don't think that we should go "back in time" to encourage
> that kind of loose RDF.
>
> We should not mandate much about exactly how it should be implemented
> with a bnode or UUID urns, perhaps just a leave a hint/good practice.
>
> I understand one reason why we tried to recommend UUID URNs is to
> allow external meta-annotations, those get trickier for Bnodes
> (specially as hasBody is not functional).
>
>
> On Wed, Jan 9, 2013 at 7:15 PM, Robert Sanderson<azaroth42@gmail.com>  wrote:
>>
>>
>> To draw this to a close:
>>
>> * We will adopt the blank node with a single cnt:chars property method for
>> when identity is considered unnecessary.
>> * The rationale given in the document will be updated to include typing (as
>> below), OWL-DL and integration, and to downplay the bullets that Antoine
>> mentioned as less convincing in his original email.
>> * The dctypes:Text additional class will be a MAY rather than a SHOULD
>> * The assignment of cnt:ContentAsText will be a SHOULD rather than a MUST,
>> as this is easily able to be inferred from the presence of cnt:chars.  Note
>> that cnt:ContentAsBase64 uses cnt:bytes, so the two are distinguishable.
>>
>> This seems to solve the number of triples and explicit identity costs, while
>> still maintaining OWL-DL compatability and consistency in the model.  The
>> typing and assignment of a MIME type to the text string (especially
>> text/plain vs text/html) is very important for clients to understand how to
>> process the text content. Overall, the cost of a single model of a blank
>> node with cnt:chars is considered less for developers than having to check
>> if the object of oa:hasBody is a literal or a resource, and decide which of
>> the two options to use.
>>
>> Rob
>>
>>
>> On Wed, Jan 9, 2013 at 12:01 PM, Bob Morris<morris.bob@gmail.com>  wrote:
>>>
>>> A few reasons we prefer OA artifacts, including bodies, as objects:
>>>
>>> 1. In our FilteredPush design we require semantic notification of
>>> events of publication to the annotation store.  It is probably
>>> difficult to do this if we are not in control of the tractability of
>>> our use  sparql 1.1 SELECT  queries that define the subscription
>>> constraints. That is, we probably need to insure that tractability
>>> issues arise from the domain vocabulary, not from the vocabularies we
>>> don't control.
>>>
>>> 2. More generally, if bodies can be strings, there still remains a
>>> need to type bodies, often with more than one type, in order that
>>> consuming application code can switch on the type.  In the community
>>> for which we build systems, it is very hard to get an agreement on use
>>> of small controlled vocabularies at all, and it is anyway a
>>> red-herring for the type to based on the particular wording of a
>>> textual body. Thus, while body typing is not a deep concern of an
>>> annotation producing human, it is a deep concern of consuming, and
>>> sometimes even producing, software.
>>>
>>> 3. Humans, other than developers, have little or no need to code the
>>> structure of annotations by hand, so the added complexity of a
>>> URI-based body should be of little concern.  In human-facing
>>> annotation producers or consumers, the details of the injection of
>>> the text strings for the body content will rarely impact human authors
>>> of annotation content. It only impacts the code developers, and things
>>> like CNT are not very difficult to program against in any platform
>>> where RDF can be handled in the first place. And as to developers,
>>> well hey, Emacs makes a pretty good RDF/N3 editor.  :-)
>>>
>>> All that said,  it certainly would serve to type oa:hasBody  simply as
>>> rdf:Property and somehow introduce an OA profile that requires
>>> oa:hasBody be an object propertyt.  But it would be a shame if
>>> consuming software abounds that cannot even find and render strings
>>> containing the textual body strings content just because they are one
>>> layer further down in something as simple as CNT or some similar
>>> structure.
>>>
>>> --Bob Morris
>>>
>>> Robert A. Morris
>>>
>>> Emeritus Professor  of Computer Science
>>> UMASS-Boston
>>> 100 Morrissey Blvd
>>> Boston, MA 02125-3390
>>>
>>> IT Staff
>>> Filtered Push Project
>>> Harvard University Herbaria
>>> Harvard University
>>>
>>> email: morris.bob@gmail.com
>>> web: http://efg.cs.umb.edu/
>>> web: http://filteredpush.org/mw/FilteredPush
>>> http://www.cs.umb.edu/~ram
>>> ===
>>> The content of this communication is made entirely on my
>>> own behalf and in no way should be deemed to express
>>> official positions of The University of Massachusetts at Boston or
>>> Harvard University.
>>>
>>>
>>> On Wed, Jan 9, 2013 at 10:21 AM, Bernhard Haslhofer
>>> <bernhard.haslhofer@cornell.edu>  wrote:
>>>> Hi,
>>>>
>>>> I not yet finished with reviewing the entire draft, but I guess I can
>>>> already comment on this. My short answer is: I fully agree with Antoine's
>>>> perspective and prefer oa:hasBody to be typed as rdf:Property, which gives
>>>> the flexibility to attach both literals and resources. I also expressed this
>>>> concern before as part of the lessons we learned in our Maphub experiment.
>>>>
>>>> I think if a body looks like a string, can be encoded as a string and
>>>> should be interpreted as a string, then it is probably a string and we
>>>> should not force people to represent it differently. Also, BNodes and/or
>>>> UUID nodes are non-trivial and to some extend controversial concepts and we
>>>> should not force developers who want to represent really simple annotations
>>>> in OA to get into this, if not required.
>>>>
>>>> One argument that prevented us from allowing string bodies in the past
>>>> was that most annotations bodies won't be strings. However, in many OA
>>>> prototypes I have seen so far, bodies ARE simple strings; also in our own
>>>> cookbook examples almost half of the examples have bodies, which are also
>>>> simple strings. So I think we should take this into account and reconsider
>>>> this design decision.
>>>>
>>>> The other argument is OWL-DL compatibility; and yes, if we want to
>>>> maintain this then we can either introduce an additional short-cut property,
>>>> continue with the current solution, or take Jacco's suggestion. However,
>>>> before we do this, I would ask how many use cases require OWL-DL
>>>> compatibility. In Maphub we don't need it and again, it seems that the
>>>> majority of annotation use cases does not rely on it. I certainly agree that
>>>> some use cases will need it, but question if this should be the driving
>>>> design motivation. I think at the end it comes down to the question, if OA
>>>> should facilitate the construction of a formal annotation knowledge base or
>>>> if OA should facilitate sharing of annotations (data) and linkage of Web
>>>> resources. My personal preference is clearly the latter because I think it
>>>> is more down-to-earth.
>>>>
>>>> Maybe it is also worth mentioning that many DBPedia properties are typed
>>>> as rdf:Property. Here is an example:
>>>> http://dbpedia.org/page/The_Shining_(film) … the producer of a movie can be
>>>> a resource (if more info is available) or simply a string (if only the name
>>>> is known). I mentioning this, because people are using DBPedia data and it
>>>> seems that people are generally happy with this design decision.
>>>>
>>>> If people think that OWL-DL compatibility is a fundamental design
>>>> requirement for the OA model, then I like Jacco's suggestion. However, I
>>>> also want to point that the people might run into practical problems when we
>>>> allow bnodes. It is, for instance, hard to compute hashes over annotation
>>>> representation containing nodes that don't have names. Existing RDF
>>>> libraries, often assign internal ids to bnodes, but they change when an
>>>> annotation moves from system A to system B.
>>>>
>>>> As a concrete step I propose:
>>>> - to allow simple literals for oa:hasBody and
>>>> - to change the type of oa:hasBody to rdf:Property and update the
>>>> corresponding examples in the spec
>>>> - provide guidelines in the appendix explaining how people who require
>>>> OWL-DL compatibility could transform OA data into a logically consistent
>>>> knowledge base.
>>>>
>>>> Best,
>>>> Bernhard
>>>>
>>>> ------
>>>> Bernhard Haslhofer
>>>> Lecturer, Postdoctoral Associate
>>>> Cornell University, Information Science
>>>> 301 College Avenue
>>>>
>>>> bernhard.haslhofer@cornell.edu (mailto:bernhard.haslhofer@cornell.edu)
>>>> http://www.cs.cornell.edu/~bh392
>>>>
>>>>
>>>> On Sunday, January 6, 2013 at 10:51 AM, Antoine Isaac wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> I've refrained from putting in my earlier comments some more
>>>>> discussion-level issues focusing on
>>>>> http://www.openannotation.org/spec/future/level1.html#BodyEmbed
>>>>>
>>>>> "This model was chosen over having a literal as the Body directly for
>>>>> the following reasons:"
>>>>> I'm sorry, but I still don't buy most of the reasons. And I believe I
>>>>> won't be the only one...
>>>>> Going through individual points:
>>>>>
>>>>>
>>>>> - "Literals have no identity, and thus cannot be referenced. If the
>>>>> Body was a literal, it would be impossible to refer to it directly and this
>>>>> is considered to be important in many use cases."
>>>>> To me it's a positive point *no* to give identifiers to simple sting
>>>>> literals. What does it mean, when you give an identifier to a string like
>>>>> "interesting!" or "I should read this"? And if the same string is assigned
>>>>> different literals? To me when you have to refer to a string from different
>>>>> places (statements), it means that you have already more than a string - it
>>>>> becomes a kind of document.
>>>>>
>>>>>
>>>>> - "It would be inconsistent with the rest of the model which allows any
>>>>> resource as a Body or Target, and thus would be a special case just for text
>>>>> in the Body."
>>>>> This one is better. But it is mitigated by the fact that in RDF
>>>>> literals are in fact resources, too
>>>>> (http://www.w3.org/TR/rdf-schema/#ch_literal). There are reasons (related to
>>>>> reasoning or syntax for which properties with literals as objects are
>>>>> distinguished from properties with "fully-fledged" resources as objects. But
>>>>> they do not apply to all RDF-based models.
>>>>>
>>>>>
>>>>> - "While literals can have their language and datatype associated with
>>>>> them in RDF, there are other aspects of text that are important for
>>>>> interpretation that cannot be associated with a literal. Examples include
>>>>> directionality of the text and encoding, plus of course metadata such as
>>>>> authorship, date of creation and so forth, which would not be possible."
>>>>>
>>>>> This is very true - though it would help reader if you gave more info
>>>>> on what "directionality" means here.
>>>>> But this argument is not against allowing literals as bodies. It just
>>>>> says that in some case, the bodies are sophisticated, document-like
>>>>> resources. Fair enough. But I will argue (and many others will) that many
>>>>> scenarios don't need this. And that it's not reasonable to impose on these
>>>>> latter scenarios the representation details that the former cases need.
>>>>> Caricaturing a bit, it looks as if we prevented string value attributes in
>>>>> object-oriented programming, on the basis that some texts deserve to be
>>>>> treated as objects.
>>>>>
>>>>> Note that we faced a similar situation in SKOS, for documentation
>>>>> properties (http://www.w3.org/TR/skos-primer/#secadvanceddocumentation). And
>>>>> the decision we made then is that these properties can be used either with
>>>>> simple literals or more complex resources. See
>>>>>
>>>>>
>>>>> - "If a server wished to extract the text and make it a resource with
>>>>> an HTTP URI, it would not be possible to assert equivalence or provenance."
>>>>>
>>>>> I think it is the utter prerogative of annotation-producing
>>>>> applications, to decide whether the bodies they produce are worthy of
>>>>> specific provenance data or not. Is there a point in keeping track of
>>>>> whether someone "created" a string like 'I should read this' in the first
>>>>> place?
>>>>> On the equivalence the argument is also not convincing: in fact
>>>>> literals come with equivalence conditions that are easy to get and already
>>>>> implemented. Trying to come with equivalence relationships between
>>>>> "resourcified literals" is much harder, both for spec designers or
>>>>> application builders (if we let them handle the issue). While working with
>>>>> the SKOS-XL extension have tried to open the can of worms of
>>>>> equivalence/identity conditions. We quite quickly postponed the issue
>>>>> (http://www.w3.org/TR/skos-reference/#L5739).
>>>>>
>>>>>
>>>>> - "The cost of using ContentAsText is minimal: just one additional
>>>>> required triple over the literal case."
>>>>>
>>>>> I quite agree with the principle, though this one additional triples
>>>>> means millions of additional ones -- I expect that cases of simple text
>>>>> annotation will be very very common.
>>>>> But I don't buy it in a context in which it is recommended to type the
>>>>> resource with a dcmitypes class, to type it as cnt:ContentAsClass and to
>>>>> give its MIME type using dc:format. That's 4 triples, not 1. And many of
>>>>> them can be seen as of dubious added value (see earlier comments)
>>>>>
>>>>>
>>>>> Note that the two SKOS patterns mentioned above (documentation
>>>>> properties and SKOS-XL) could be used in OA to have simple text bodies
>>>>> co-exist with more complex ones, either in relative isolation (SKOS
>>>>> documentation pattern) and with a tighter correspondence (SKOS-XL pattern
>>>>> allows to switch from one pattern to the other).
>>>>> And I believe that relative isolation is not as bad as it looks.
>>>>> Applications who produce simple bodies can only be bothered by the
>>>>> perspective of having more complex data on these bodies. And applications
>>>>> requiring more complex data (say, provenance) would probably need some more
>>>>> complex procedure to generate it from the data produced by the simpler
>>>>> applications.
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Antoine
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>>
>>
>
>
>
Received on Thursday, 10 January 2013 21:45:59 UTC