- From: Antoine Isaac <aisaac@few.vu.nl>
- Date: Mon, 4 Feb 2013 13:59:53 +0100
- To: <public-openannotation@w3.org>
Hi Stian, Indeed there's not much way CNT could constrain the use of cnt:chars, maybe it's difficult to write a formal spec of what would qualify as "content" in an RDF environment. It just requires that users would "get it right"--just as many other elements in OA or elsewhere (OA motivations, for a start). Now, if we don't trust CNT to be used right, nothing prevents us from coining a new (sub)property to replace cnt:chars. Antoine > I know this is taking it a bit of on an edge. I am primarily just > worried about having implied semantics based on the presence or not of > a property which is not even ours. That such usage would mainly sound > stupid in the examples we make up, they are not disallowed by other > specifications, and I don't think we can mandate how other > vocabularies should be used on non-OA resources. > > > On Mon, Feb 4, 2013 at 11:16 AM, Antoine Isaac<aisaac@few.vu.nl> wrote: >> Hi Stian, >> >> All this is leading us into deep ontological thinking... >> The baseline is that Content in RDF is for "Content", ie. just encoding of >> stuff, the content of a file. When somebody with no knowledge of biology >> types "GATTTTTTTTTTACA" it's not a nucleotide sequence, it's a string. The T >> there has as much semantics as the t in "Stian". >> >> Even if a nucleotide sequence may not need to refer to molecules to be >> operational, bioinformaticians still assume something more than a string of >> literals. You're expected to do something with it that has certain >> semantics, even if they are low-level: ie., the main splitting level is the >> one of individual symbols (letters), you can't have an X in it, etc. >> >> As you say the string represents the sequence, and that still hints at a >> quite important difference in level. the value of cnt:chars does not >> represent content, it is the content. >> >> Antoine >> >> >> >>> On Fri, Feb 1, 2013 at 5:18 PM, Robert Sanderson<azaroth42@gmail.com> >>> wrote: >>> >>>> http://dbpedia.org/resource/Paris doesn't identify a document, so >>>> there's no confusion as to whether to dereference it or not. >>> >>> >>> No, here we are lucky in that dbpedia.org is playing by the rules. >>> >>>> Using documents as *semantic* tags is simply bad modeling. Do you >>>> mean the document or the semantic concept (eg my home page or me). >>>> Surely this has been discussed long enough in other contexts that we >>>> don't have to rehash it here? >>> >>> >>> Of course. I am not saying that it is not bad modelling. I am just >>> trying to say you would find this in the wild, and it would not be >>> against the current specifications for HTTP, HTML, RDF, etc. >>> >>> In particular you would find hash-URIs like >>> <http://example.com/aDocument.rdf#concept> - now is that covered by >>> not recommended "the URI of a document"? That is unclear by the >>> current wording. >>> >>> Also you would find examples like<http://omim.org/entry/104760> by >>> Paolo, of course here the omim.org site is 'innocent' in that they >>> never intended to mint a semantic concept. That should not preclude >>> users of OA to use it as such. >>> >>>> But to assert that a non information resource, the city of Paris, has >>>> content is clearly wrong. >>> >>> >>> I agree that would be silly for Paris. But we don't know what other >>> users of other concepts have done using Content-in-RDF, which is >>> another specification. There is nothing in the Content-in-RDF spec >>> that would not allow it to be used such. cnt:Content does not mandate >>> that the resource is an infoamrtion resource. >>> >>>> The cnt:Content class is an overarching class for any content that could >>>> be found on the Web, in an Intranet or in local storage media, for example. >>>> It is recommended always to use one of its subclasses. There is no >>>> restriction within the vocabulary scope on what can be represented with this >>>> class: textual content, XML files, binary files (e.g., images or movies), >>>> etc. >>> >>> >>> >>> >>>>> For instance, >>>>> semantic tags identifying genome sequences might very well be >>>>> including the actual genome sequence (like "GATTATTATATATATAGATTACA" >>>>> as cnt:chars. >>>> >>>> And that too would be wrong. The biological genome in the real world >>>> does not contain a string of characters in UTF-8 like that. >>> >>> >>> No, but they are commonly represented as such. Just like a person's >>> name is not a string of characters in UTF-8. A nucleotide sequence is >>> the primary representation that they are recognized as. I asked two >>> bioinformaticians separately: >>> >>> >>> [10:18:59] Stian Soiland-Reyes: What would you call this (type of) thing? >>> GATTTTTTTTTTTTTTTACCCACACACACA >>> [10:35:51] Stian Soiland-Reyes: ignoring finer details such as introns etc >>> [10:35:55] Kristina Hettne: a DNA sequence >>> >>> >>> [10:18:56] Stian Soiland-Reyes: What would you call this (type of) thing? >>> GATTTTTTTTTTTTTTTACCCACACACACA >>> [10:19:19] Katy Wolstencroft: a nucleotide sequence >>> >>> >>> So just like you would call "Paris" a city (or the name of a city), >>> they would identify it as a sequence, and that's the abstraction level >>> they work on, not on particular molecules inside a cell found inside a >>> particular organism in this lab. >>> >>> >>> >>> >>>> From Content-in-RDF: >>> >>> >>>> cnt:chars >>>> The character sequence of the given content. >>> >>> >>> >>> So I think there is nothing stopping anyone from doing: >>> >>> >>> <http://example.com/gene/1337> a :NucleotideSequence ; >>> :sequence "GATTTTTTTTTTACA" . >>> >>> :sequence a owl:DatatypeProperty ; >>> rdfs:subPropertyOf cnt:chars ; >>> rdfs:domain :NucleotideSequence . >>> >>> Their reason for using cnt:chars here could be that a GATC letter >>> transcription of a genome sequence is the primary representation of >>> the abstract concept of a nucleotide sequence in the field. >>> >>> >>> >>> But now I (who we can pretend did not write the above) can't use >>> <http://example.com/gene/1337> as a OA semantic tag, because it >>> happens to have an (implied) cnt:chars property, and I would be >>> seeming to say that the user has tagged "GATTTTTTTTTTACA" as a text. >>> The example.com guys should not be required to read the OA specs to >>> prevent this, they just follow Content-in-RDF. >>> >>> >>>> Yes, but that particular plague makes everything practically unusable. >>>> Does this specific resource have a state? I don't know! How many >>>> targets are there for the Annotation? I don't know, there could be >>>> others that I don't know about! Does this Annotation have a body? I >>>> don't know, please just let me get on with my job! etc. :) >>> >>> >>> I know, we don't want to go there. However it is one thing to go from >>> "unspecific to specific" (as in adding state), another to totally >>> change the semantic "if unspecified, it's X, otherwise it's Y (which >>> is not Y!)". >>> >>> >>>> <anno1> a oa:Annotation ; >>>> oa:hasSemanticTag<composite1> ; >>>> oa:hasTarget<target1> . >>>> >>>> <composite1> isn't intended as a semantic tag. But if we allow any URI >>>> to be used as a tag, nothing prevents someone from saying it is. So >>>> already we have trouble. >>> >>> >>> Ah, I had not thought about this case. Yes, now oa:hasSemanticTag is >>> very misleading. So we would have to disallow both Composite and >>> Specific Resource indirections in my proposal, which would make it >>> very special case. >>> >>>> Here,<textualbody1> is the resource that<semantictag1> was extracted >>>> from. The semantics of Composite are that all of the items are >>>> required, which is what the publisher wants to convey. >>>> Except textualbody isn't a tag. Nor is composite1. This is the same >>>> argument as against a new predicate for literals as bodies. >>> >>> >>> If you want to annotate that I would propose that as an independent >>> provenance statement (<composite1>/<anno1> pav:importedFrom >>> <textualbody1>), and not conflate it into the very same annotation. >>> >>> If you are trying to say that the user typed in the<textualbody1> as >>> an annotation on<target1>, and the system have subsequently found >>> some semantic tag in the<textualbody1>, then I would try to do the >>> second step as a second annotation<anno2> with targets both >>> <textualbody1> and<target1> (with an optional provenance trace of >>> <anno2> pav:importedFrom<textualbody1> ; pav:derivedFrom<anno1> ) >>> >>> >>>> If there's a solution that allows a mix of body types, I would be >>>> overjoyed! But I can't see how to do that without introducing any of: >>>> 1. a node in between (as current spec for documents); 2. a class or >>>> other property (as current spec for non documents); or 3. a new >>>> predicate (that gets us in trouble) >>> >>> >>> I like the suggestion in your next email, which is to subclass/type a >>> SpecificResource for this purpose. This solves nicely the problems >>> above, and also avoids introducing a new, independent concept. It >>> does structurally mean that we have to split or move the Tagging >>> section. >>> >>> Perhaps ; counter to my previous reply - the best solution would be a >>> split. Let the Tagging section stay where it is - textual tagging is a >>> quite primary type of annotation we should support at "level 1". >>> Semantic tagging is a more advanced feature, and can be presented with >>> the specifiers as a new section 3.6 - a specialization of the level 1 >>> tagging. The first section will then just say "For semantic tagging; >>> see section X.X." >>> >>> >> >> > > >
Received on Monday, 4 February 2013 13:00:23 UTC