- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Mon, 4 Feb 2013 12:13:04 +0000
- To: Antoine Isaac <aisaac@few.vu.nl>
- Cc: public-openannotation@w3.org
I know this is taking it a bit of on an edge. I am primarily just worried about having implied semantics based on the presence or not of a property which is not even ours. That such usage would mainly sound stupid in the examples we make up, they are not disallowed by other specifications, and I don't think we can mandate how other vocabularies should be used on non-OA resources. On Mon, Feb 4, 2013 at 11:16 AM, Antoine Isaac <aisaac@few.vu.nl> wrote: > Hi Stian, > > All this is leading us into deep ontological thinking... > The baseline is that Content in RDF is for "Content", ie. just encoding of > stuff, the content of a file. When somebody with no knowledge of biology > types "GATTTTTTTTTTACA" it's not a nucleotide sequence, it's a string. The T > there has as much semantics as the t in "Stian". > > Even if a nucleotide sequence may not need to refer to molecules to be > operational, bioinformaticians still assume something more than a string of > literals. You're expected to do something with it that has certain > semantics, even if they are low-level: ie., the main splitting level is the > one of individual symbols (letters), you can't have an X in it, etc. > > As you say the string represents the sequence, and that still hints at a > quite important difference in level. the value of cnt:chars does not > represent content, it is the content. > > Antoine > > > >> On Fri, Feb 1, 2013 at 5:18 PM, Robert Sanderson<azaroth42@gmail.com> >> wrote: >> >>> http://dbpedia.org/resource/Paris doesn't identify a document, so >>> there's no confusion as to whether to dereference it or not. >> >> >> No, here we are lucky in that dbpedia.org is playing by the rules. >> >>> Using documents as *semantic* tags is simply bad modeling. Do you >>> mean the document or the semantic concept (eg my home page or me). >>> Surely this has been discussed long enough in other contexts that we >>> don't have to rehash it here? >> >> >> Of course. I am not saying that it is not bad modelling. I am just >> trying to say you would find this in the wild, and it would not be >> against the current specifications for HTTP, HTML, RDF, etc. >> >> In particular you would find hash-URIs like >> <http://example.com/aDocument.rdf#concept> - now is that covered by >> not recommended "the URI of a document"? That is unclear by the >> current wording. >> >> Also you would find examples like<http://omim.org/entry/104760> by >> Paolo, of course here the omim.org site is 'innocent' in that they >> never intended to mint a semantic concept. That should not preclude >> users of OA to use it as such. >> >>> But to assert that a non information resource, the city of Paris, has >>> content is clearly wrong. >> >> >> I agree that would be silly for Paris. But we don't know what other >> users of other concepts have done using Content-in-RDF, which is >> another specification. There is nothing in the Content-in-RDF spec >> that would not allow it to be used such. cnt:Content does not mandate >> that the resource is an infoamrtion resource. >> >>> The cnt:Content class is an overarching class for any content that could >>> be found on the Web, in an Intranet or in local storage media, for example. >>> It is recommended always to use one of its subclasses. There is no >>> restriction within the vocabulary scope on what can be represented with this >>> class: textual content, XML files, binary files (e.g., images or movies), >>> etc. >> >> >> >> >>>> For instance, >>>> semantic tags identifying genome sequences might very well be >>>> including the actual genome sequence (like "GATTATTATATATATAGATTACA" >>>> as cnt:chars. >>> >>> And that too would be wrong. The biological genome in the real world >>> does not contain a string of characters in UTF-8 like that. >> >> >> No, but they are commonly represented as such. Just like a person's >> name is not a string of characters in UTF-8. A nucleotide sequence is >> the primary representation that they are recognized as. I asked two >> bioinformaticians separately: >> >> >> [10:18:59] Stian Soiland-Reyes: What would you call this (type of) thing? >> GATTTTTTTTTTTTTTTACCCACACACACA >> [10:35:51] Stian Soiland-Reyes: ignoring finer details such as introns etc >> [10:35:55] Kristina Hettne: a DNA sequence >> >> >> [10:18:56] Stian Soiland-Reyes: What would you call this (type of) thing? >> GATTTTTTTTTTTTTTTACCCACACACACA >> [10:19:19] Katy Wolstencroft: a nucleotide sequence >> >> >> So just like you would call "Paris" a city (or the name of a city), >> they would identify it as a sequence, and that's the abstraction level >> they work on, not on particular molecules inside a cell found inside a >> particular organism in this lab. >> >> >> >> >>> From Content-in-RDF: >> >> >>> cnt:chars >>> The character sequence of the given content. >> >> >> >> So I think there is nothing stopping anyone from doing: >> >> >> <http://example.com/gene/1337> a :NucleotideSequence ; >> :sequence "GATTTTTTTTTTACA" . >> >> :sequence a owl:DatatypeProperty ; >> rdfs:subPropertyOf cnt:chars ; >> rdfs:domain :NucleotideSequence . >> >> Their reason for using cnt:chars here could be that a GATC letter >> transcription of a genome sequence is the primary representation of >> the abstract concept of a nucleotide sequence in the field. >> >> >> >> But now I (who we can pretend did not write the above) can't use >> <http://example.com/gene/1337> as a OA semantic tag, because it >> happens to have an (implied) cnt:chars property, and I would be >> seeming to say that the user has tagged "GATTTTTTTTTTACA" as a text. >> The example.com guys should not be required to read the OA specs to >> prevent this, they just follow Content-in-RDF. >> >> >>> Yes, but that particular plague makes everything practically unusable. >>> Does this specific resource have a state? I don't know! How many >>> targets are there for the Annotation? I don't know, there could be >>> others that I don't know about! Does this Annotation have a body? I >>> don't know, please just let me get on with my job! etc. :) >> >> >> I know, we don't want to go there. However it is one thing to go from >> "unspecific to specific" (as in adding state), another to totally >> change the semantic "if unspecified, it's X, otherwise it's Y (which >> is not Y!)". >> >> >>> <anno1> a oa:Annotation ; >>> oa:hasSemanticTag<composite1> ; >>> oa:hasTarget<target1> . >>> >>> <composite1> isn't intended as a semantic tag. But if we allow any URI >>> to be used as a tag, nothing prevents someone from saying it is. So >>> already we have trouble. >> >> >> Ah, I had not thought about this case. Yes, now oa:hasSemanticTag is >> very misleading. So we would have to disallow both Composite and >> Specific Resource indirections in my proposal, which would make it >> very special case. >> >>> Here,<textualbody1> is the resource that<semantictag1> was extracted >>> from. The semantics of Composite are that all of the items are >>> required, which is what the publisher wants to convey. >>> Except textualbody isn't a tag. Nor is composite1. This is the same >>> argument as against a new predicate for literals as bodies. >> >> >> If you want to annotate that I would propose that as an independent >> provenance statement (<composite1>/<anno1> pav:importedFrom >> <textualbody1>), and not conflate it into the very same annotation. >> >> If you are trying to say that the user typed in the<textualbody1> as >> an annotation on<target1>, and the system have subsequently found >> some semantic tag in the<textualbody1>, then I would try to do the >> second step as a second annotation<anno2> with targets both >> <textualbody1> and<target1> (with an optional provenance trace of >> <anno2> pav:importedFrom<textualbody1> ; pav:derivedFrom<anno1> ) >> >> >>> If there's a solution that allows a mix of body types, I would be >>> overjoyed! But I can't see how to do that without introducing any of: >>> 1. a node in between (as current spec for documents); 2. a class or >>> other property (as current spec for non documents); or 3. a new >>> predicate (that gets us in trouble) >> >> >> I like the suggestion in your next email, which is to subclass/type a >> SpecificResource for this purpose. This solves nicely the problems >> above, and also avoids introducing a new, independent concept. It >> does structurally mean that we have to split or move the Tagging >> section. >> >> Perhaps ; counter to my previous reply - the best solution would be a >> split. Let the Tagging section stay where it is - textual tagging is a >> quite primary type of annotation we should support at "level 1". >> Semantic tagging is a more advanced feature, and can be presented with >> the specifiers as a new section 3.6 - a specialization of the level 1 >> tagging. The first section will then just say "For semantic tagging; >> see section X.X." >> >> > > -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester
Received on Monday, 4 February 2013 12:13:51 UTC