- From: Antoine Isaac <aisaac@few.vu.nl>
- Date: Thu, 10 Jan 2013 22:45:28 +0100
- To: <public-openannotation@w3.org>
Hi Stian, For info the old Dublin Core is still not dead. In the Europeana project (dealing with cultural heritage data) we use it, precisely because it allows flexibility: many data providers just want to provide what they can. And as a matter of fact we use the old DC vocabulary because we're respectful of specifications: in our domain many many many who use properties like dcterms:creator use them with simple literals. The more elaborate pattern (with resource) serves rather as a guideline for encouraging these who are willing to put more resource in producing richer data. We would never think of reading it as a hard constraint! For some domains the barrier to implementation have to be seriously lowered. For info schema.org has the same kind of approach. Some properties advertise classes resources as range; but the guidelines explicitly say that they can be used with simple literals instead. Antoine > I like this conclusion. I am very uneasy with a property being both an > Object Property and Data Property, that makes it very tricky in an OWL > context and how to parse in implementations. > > One of the problems with the old Dublin Core Elements was that it was > blurry about what to expect in the range - so you could find both of > these styles in the wild: > > :x dc:creator "John Doe" . > :y dc:creator<http://example.com/JohnDoe> . > > And this made it very inconsistent across implementations. DC Terms > has fixed this, at some points at cost of verbosity or need to find > reference URIs. (dc:format "image/jpeg" vs dcterms:format mime:jpeg). > > Similarly I don't think that we should go "back in time" to encourage > that kind of loose RDF. > > We should not mandate much about exactly how it should be implemented > with a bnode or UUID urns, perhaps just a leave a hint/good practice. > > I understand one reason why we tried to recommend UUID URNs is to > allow external meta-annotations, those get trickier for Bnodes > (specially as hasBody is not functional). > > > On Wed, Jan 9, 2013 at 7:15 PM, Robert Sanderson<azaroth42@gmail.com> wrote: >> >> >> To draw this to a close: >> >> * We will adopt the blank node with a single cnt:chars property method for >> when identity is considered unnecessary. >> * The rationale given in the document will be updated to include typing (as >> below), OWL-DL and integration, and to downplay the bullets that Antoine >> mentioned as less convincing in his original email. >> * The dctypes:Text additional class will be a MAY rather than a SHOULD >> * The assignment of cnt:ContentAsText will be a SHOULD rather than a MUST, >> as this is easily able to be inferred from the presence of cnt:chars. Note >> that cnt:ContentAsBase64 uses cnt:bytes, so the two are distinguishable. >> >> This seems to solve the number of triples and explicit identity costs, while >> still maintaining OWL-DL compatability and consistency in the model. The >> typing and assignment of a MIME type to the text string (especially >> text/plain vs text/html) is very important for clients to understand how to >> process the text content. Overall, the cost of a single model of a blank >> node with cnt:chars is considered less for developers than having to check >> if the object of oa:hasBody is a literal or a resource, and decide which of >> the two options to use. >> >> Rob >> >> >> On Wed, Jan 9, 2013 at 12:01 PM, Bob Morris<morris.bob@gmail.com> wrote: >>> >>> A few reasons we prefer OA artifacts, including bodies, as objects: >>> >>> 1. In our FilteredPush design we require semantic notification of >>> events of publication to the annotation store. It is probably >>> difficult to do this if we are not in control of the tractability of >>> our use sparql 1.1 SELECT queries that define the subscription >>> constraints. That is, we probably need to insure that tractability >>> issues arise from the domain vocabulary, not from the vocabularies we >>> don't control. >>> >>> 2. More generally, if bodies can be strings, there still remains a >>> need to type bodies, often with more than one type, in order that >>> consuming application code can switch on the type. In the community >>> for which we build systems, it is very hard to get an agreement on use >>> of small controlled vocabularies at all, and it is anyway a >>> red-herring for the type to based on the particular wording of a >>> textual body. Thus, while body typing is not a deep concern of an >>> annotation producing human, it is a deep concern of consuming, and >>> sometimes even producing, software. >>> >>> 3. Humans, other than developers, have little or no need to code the >>> structure of annotations by hand, so the added complexity of a >>> URI-based body should be of little concern. In human-facing >>> annotation producers or consumers, the details of the injection of >>> the text strings for the body content will rarely impact human authors >>> of annotation content. It only impacts the code developers, and things >>> like CNT are not very difficult to program against in any platform >>> where RDF can be handled in the first place. And as to developers, >>> well hey, Emacs makes a pretty good RDF/N3 editor. :-) >>> >>> All that said, it certainly would serve to type oa:hasBody simply as >>> rdf:Property and somehow introduce an OA profile that requires >>> oa:hasBody be an object propertyt. But it would be a shame if >>> consuming software abounds that cannot even find and render strings >>> containing the textual body strings content just because they are one >>> layer further down in something as simple as CNT or some similar >>> structure. >>> >>> --Bob Morris >>> >>> Robert A. Morris >>> >>> Emeritus Professor of Computer Science >>> UMASS-Boston >>> 100 Morrissey Blvd >>> Boston, MA 02125-3390 >>> >>> IT Staff >>> Filtered Push Project >>> Harvard University Herbaria >>> Harvard University >>> >>> email: morris.bob@gmail.com >>> web: http://efg.cs.umb.edu/ >>> web: http://filteredpush.org/mw/FilteredPush >>> http://www.cs.umb.edu/~ram >>> === >>> The content of this communication is made entirely on my >>> own behalf and in no way should be deemed to express >>> official positions of The University of Massachusetts at Boston or >>> Harvard University. >>> >>> >>> On Wed, Jan 9, 2013 at 10:21 AM, Bernhard Haslhofer >>> <bernhard.haslhofer@cornell.edu> wrote: >>>> Hi, >>>> >>>> I not yet finished with reviewing the entire draft, but I guess I can >>>> already comment on this. My short answer is: I fully agree with Antoine's >>>> perspective and prefer oa:hasBody to be typed as rdf:Property, which gives >>>> the flexibility to attach both literals and resources. I also expressed this >>>> concern before as part of the lessons we learned in our Maphub experiment. >>>> >>>> I think if a body looks like a string, can be encoded as a string and >>>> should be interpreted as a string, then it is probably a string and we >>>> should not force people to represent it differently. Also, BNodes and/or >>>> UUID nodes are non-trivial and to some extend controversial concepts and we >>>> should not force developers who want to represent really simple annotations >>>> in OA to get into this, if not required. >>>> >>>> One argument that prevented us from allowing string bodies in the past >>>> was that most annotations bodies won't be strings. However, in many OA >>>> prototypes I have seen so far, bodies ARE simple strings; also in our own >>>> cookbook examples almost half of the examples have bodies, which are also >>>> simple strings. So I think we should take this into account and reconsider >>>> this design decision. >>>> >>>> The other argument is OWL-DL compatibility; and yes, if we want to >>>> maintain this then we can either introduce an additional short-cut property, >>>> continue with the current solution, or take Jacco's suggestion. However, >>>> before we do this, I would ask how many use cases require OWL-DL >>>> compatibility. In Maphub we don't need it and again, it seems that the >>>> majority of annotation use cases does not rely on it. I certainly agree that >>>> some use cases will need it, but question if this should be the driving >>>> design motivation. I think at the end it comes down to the question, if OA >>>> should facilitate the construction of a formal annotation knowledge base or >>>> if OA should facilitate sharing of annotations (data) and linkage of Web >>>> resources. My personal preference is clearly the latter because I think it >>>> is more down-to-earth. >>>> >>>> Maybe it is also worth mentioning that many DBPedia properties are typed >>>> as rdf:Property. Here is an example: >>>> http://dbpedia.org/page/The_Shining_(film) … the producer of a movie can be >>>> a resource (if more info is available) or simply a string (if only the name >>>> is known). I mentioning this, because people are using DBPedia data and it >>>> seems that people are generally happy with this design decision. >>>> >>>> If people think that OWL-DL compatibility is a fundamental design >>>> requirement for the OA model, then I like Jacco's suggestion. However, I >>>> also want to point that the people might run into practical problems when we >>>> allow bnodes. It is, for instance, hard to compute hashes over annotation >>>> representation containing nodes that don't have names. Existing RDF >>>> libraries, often assign internal ids to bnodes, but they change when an >>>> annotation moves from system A to system B. >>>> >>>> As a concrete step I propose: >>>> - to allow simple literals for oa:hasBody and >>>> - to change the type of oa:hasBody to rdf:Property and update the >>>> corresponding examples in the spec >>>> - provide guidelines in the appendix explaining how people who require >>>> OWL-DL compatibility could transform OA data into a logically consistent >>>> knowledge base. >>>> >>>> Best, >>>> Bernhard >>>> >>>> ------ >>>> Bernhard Haslhofer >>>> Lecturer, Postdoctoral Associate >>>> Cornell University, Information Science >>>> 301 College Avenue >>>> >>>> bernhard.haslhofer@cornell.edu (mailto:bernhard.haslhofer@cornell.edu) >>>> http://www.cs.cornell.edu/~bh392 >>>> >>>> >>>> On Sunday, January 6, 2013 at 10:51 AM, Antoine Isaac wrote: >>>> >>>>> Dear all, >>>>> >>>>> I've refrained from putting in my earlier comments some more >>>>> discussion-level issues focusing on >>>>> http://www.openannotation.org/spec/future/level1.html#BodyEmbed >>>>> >>>>> "This model was chosen over having a literal as the Body directly for >>>>> the following reasons:" >>>>> I'm sorry, but I still don't buy most of the reasons. And I believe I >>>>> won't be the only one... >>>>> Going through individual points: >>>>> >>>>> >>>>> - "Literals have no identity, and thus cannot be referenced. If the >>>>> Body was a literal, it would be impossible to refer to it directly and this >>>>> is considered to be important in many use cases." >>>>> To me it's a positive point *no* to give identifiers to simple sting >>>>> literals. What does it mean, when you give an identifier to a string like >>>>> "interesting!" or "I should read this"? And if the same string is assigned >>>>> different literals? To me when you have to refer to a string from different >>>>> places (statements), it means that you have already more than a string - it >>>>> becomes a kind of document. >>>>> >>>>> >>>>> - "It would be inconsistent with the rest of the model which allows any >>>>> resource as a Body or Target, and thus would be a special case just for text >>>>> in the Body." >>>>> This one is better. But it is mitigated by the fact that in RDF >>>>> literals are in fact resources, too >>>>> (http://www.w3.org/TR/rdf-schema/#ch_literal). There are reasons (related to >>>>> reasoning or syntax for which properties with literals as objects are >>>>> distinguished from properties with "fully-fledged" resources as objects. But >>>>> they do not apply to all RDF-based models. >>>>> >>>>> >>>>> - "While literals can have their language and datatype associated with >>>>> them in RDF, there are other aspects of text that are important for >>>>> interpretation that cannot be associated with a literal. Examples include >>>>> directionality of the text and encoding, plus of course metadata such as >>>>> authorship, date of creation and so forth, which would not be possible." >>>>> >>>>> This is very true - though it would help reader if you gave more info >>>>> on what "directionality" means here. >>>>> But this argument is not against allowing literals as bodies. It just >>>>> says that in some case, the bodies are sophisticated, document-like >>>>> resources. Fair enough. But I will argue (and many others will) that many >>>>> scenarios don't need this. And that it's not reasonable to impose on these >>>>> latter scenarios the representation details that the former cases need. >>>>> Caricaturing a bit, it looks as if we prevented string value attributes in >>>>> object-oriented programming, on the basis that some texts deserve to be >>>>> treated as objects. >>>>> >>>>> Note that we faced a similar situation in SKOS, for documentation >>>>> properties (http://www.w3.org/TR/skos-primer/#secadvanceddocumentation). And >>>>> the decision we made then is that these properties can be used either with >>>>> simple literals or more complex resources. See >>>>> >>>>> >>>>> - "If a server wished to extract the text and make it a resource with >>>>> an HTTP URI, it would not be possible to assert equivalence or provenance." >>>>> >>>>> I think it is the utter prerogative of annotation-producing >>>>> applications, to decide whether the bodies they produce are worthy of >>>>> specific provenance data or not. Is there a point in keeping track of >>>>> whether someone "created" a string like 'I should read this' in the first >>>>> place? >>>>> On the equivalence the argument is also not convincing: in fact >>>>> literals come with equivalence conditions that are easy to get and already >>>>> implemented. Trying to come with equivalence relationships between >>>>> "resourcified literals" is much harder, both for spec designers or >>>>> application builders (if we let them handle the issue). While working with >>>>> the SKOS-XL extension have tried to open the can of worms of >>>>> equivalence/identity conditions. We quite quickly postponed the issue >>>>> (http://www.w3.org/TR/skos-reference/#L5739). >>>>> >>>>> >>>>> - "The cost of using ContentAsText is minimal: just one additional >>>>> required triple over the literal case." >>>>> >>>>> I quite agree with the principle, though this one additional triples >>>>> means millions of additional ones -- I expect that cases of simple text >>>>> annotation will be very very common. >>>>> But I don't buy it in a context in which it is recommended to type the >>>>> resource with a dcmitypes class, to type it as cnt:ContentAsClass and to >>>>> give its MIME type using dc:format. That's 4 triples, not 1. And many of >>>>> them can be seen as of dubious added value (see earlier comments) >>>>> >>>>> >>>>> Note that the two SKOS patterns mentioned above (documentation >>>>> properties and SKOS-XL) could be used in OA to have simple text bodies >>>>> co-exist with more complex ones, either in relative isolation (SKOS >>>>> documentation pattern) and with a tighter correspondence (SKOS-XL pattern >>>>> allows to switch from one pattern to the other). >>>>> And I believe that relative isolation is not as bad as it looks. >>>>> Applications who produce simple bodies can only be bothered by the >>>>> perspective of having more complex data on these bodies. And applications >>>>> requiring more complex data (say, provenance) would probably need some more >>>>> complex procedure to generate it from the data produced by the simpler >>>>> applications. >>>>> >>>>> >>>>> Best, >>>>> >>>>> Antoine >>>> >>>> >>>> >>> >>> >>> >>> -- >>> >> > > >
Received on Thursday, 10 January 2013 21:45:59 UTC