- From: Robert Sanderson <azaroth42@gmail.com>
- Date: Mon, 4 Feb 2013 10:18:42 -0700
- To: Antoine Isaac <aisaac@few.vu.nl>
- Cc: public-openannotation@w3.org
To try to pull the threads together ... Issue: If there is a document which an annotator wants to use as a semantic tag, then it is not possible to say that it's an oa:Tag, as that information is specific to the Annotation. Use cases: Many use cases, especially in bioinformatics. Severity: Difficult to determine and somewhat mitigated by the (unanimous?) consensus that it is bad modeling and against the architecture of the WWW to have a URI identify both a concept and a document at the same time. Severe enough in communities that need it that it would be great if it was addressed. Current: The spec does not say exactly how to solve the problem, but recommends minting a new URI for the tag and relating it "somehow" to the document. It also has a single oa:Tag class, and relies on the presence or non-presence of cnt:chars. Regarding, first oa:Tag versus oa:SemanticTag: * The open world assumption means that the non-presence of cnt:chars means "we don't know if it's a semantic tag or not". * It's not our predicate to associate additional semantics with its presence, or lack thereof * If you get an HTTP URI that calls itself a tag, and has cnt:chars, it's unclear what to do. Thus the proposal is to have a subclass, oa:SemanticTag to avoid these situations. There are several implicit proposals as to the model, all of which further clarify the current recommendation: 1. (Rob) Use Specific Resource with a oa:SemanticTag class. Then the object of oa:hasSource is the document. Objection from Antoine: This is abusing Specific Resources. 2. (Antoine) Use a oa:SemanticTag class, with foaf:primaryTopicOf. Object from Rob: it's inverse functional, so the same document couldn't be used for different semantic concepts. As the URI for the tag resource is likely going to be a UUID or a blank node, this could have unfortunate repercussions. 3. (Rob) Use oa:SemanticTag class, with foaf:page. This is the same as 2. but with a looser predicate that isn't functional. Thanks all! Please correct and add to this if I misunderstood or misrepresented anything :) Rob On Mon, Feb 4, 2013 at 5:59 AM, Antoine Isaac <aisaac@few.vu.nl> wrote: > Hi Stian, > > Indeed there's not much way CNT could constrain the use of cnt:chars, maybe > it's difficult to write a formal spec of what would qualify as "content" in > an RDF environment. It just requires that users would "get it right"--just > as many other elements in OA or elsewhere (OA motivations, for a start). > > Now, if we don't trust CNT to be used right, nothing prevents us from > coining a new (sub)property to replace cnt:chars. > > Antoine > > > >> I know this is taking it a bit of on an edge. I am primarily just >> worried about having implied semantics based on the presence or not of >> a property which is not even ours. That such usage would mainly sound >> stupid in the examples we make up, they are not disallowed by other >> specifications, and I don't think we can mandate how other >> vocabularies should be used on non-OA resources. >> >> >> On Mon, Feb 4, 2013 at 11:16 AM, Antoine Isaac<aisaac@few.vu.nl> wrote: >>> >>> Hi Stian, >>> >>> All this is leading us into deep ontological thinking... >>> The baseline is that Content in RDF is for "Content", ie. just encoding >>> of >>> stuff, the content of a file. When somebody with no knowledge of biology >>> types "GATTTTTTTTTTACA" it's not a nucleotide sequence, it's a string. >>> The T >>> there has as much semantics as the t in "Stian". >>> >>> Even if a nucleotide sequence may not need to refer to molecules to be >>> operational, bioinformaticians still assume something more than a string >>> of >>> literals. You're expected to do something with it that has certain >>> semantics, even if they are low-level: ie., the main splitting level is >>> the >>> one of individual symbols (letters), you can't have an X in it, etc. >>> >>> As you say the string represents the sequence, and that still hints at a >>> quite important difference in level. the value of cnt:chars does not >>> represent content, it is the content. >>> >>> Antoine >>> >>> >>> >>>> On Fri, Feb 1, 2013 at 5:18 PM, Robert Sanderson<azaroth42@gmail.com> >>>> wrote: >>>> >>>>> http://dbpedia.org/resource/Paris doesn't identify a document, so >>>>> there's no confusion as to whether to dereference it or not. >>>> >>>> >>>> >>>> No, here we are lucky in that dbpedia.org is playing by the rules. >>>> >>>>> Using documents as *semantic* tags is simply bad modeling. Do you >>>>> mean the document or the semantic concept (eg my home page or me). >>>>> Surely this has been discussed long enough in other contexts that we >>>>> don't have to rehash it here? >>>> >>>> >>>> >>>> Of course. I am not saying that it is not bad modelling. I am just >>>> trying to say you would find this in the wild, and it would not be >>>> against the current specifications for HTTP, HTML, RDF, etc. >>>> >>>> In particular you would find hash-URIs like >>>> <http://example.com/aDocument.rdf#concept> - now is that covered by >>>> not recommended "the URI of a document"? That is unclear by the >>>> current wording. >>>> >>>> Also you would find examples like<http://omim.org/entry/104760> by >>>> Paolo, of course here the omim.org site is 'innocent' in that they >>>> never intended to mint a semantic concept. That should not preclude >>>> users of OA to use it as such. >>>> >>>>> But to assert that a non information resource, the city of Paris, has >>>>> content is clearly wrong. >>>> >>>> >>>> >>>> I agree that would be silly for Paris. But we don't know what other >>>> users of other concepts have done using Content-in-RDF, which is >>>> another specification. There is nothing in the Content-in-RDF spec >>>> that would not allow it to be used such. cnt:Content does not mandate >>>> that the resource is an infoamrtion resource. >>>> >>>>> The cnt:Content class is an overarching class for any content that >>>>> could >>>>> be found on the Web, in an Intranet or in local storage media, for >>>>> example. >>>>> It is recommended always to use one of its subclasses. There is no >>>>> restriction within the vocabulary scope on what can be represented with >>>>> this >>>>> class: textual content, XML files, binary files (e.g., images or >>>>> movies), >>>>> etc. >>>> >>>> >>>> >>>> >>>> >>>>>> For instance, >>>>>> semantic tags identifying genome sequences might very well be >>>>>> including the actual genome sequence (like "GATTATTATATATATAGATTACA" >>>>>> as cnt:chars. >>>>> >>>>> >>>>> And that too would be wrong. The biological genome in the real world >>>>> does not contain a string of characters in UTF-8 like that. >>>> >>>> >>>> >>>> No, but they are commonly represented as such. Just like a person's >>>> name is not a string of characters in UTF-8. A nucleotide sequence is >>>> the primary representation that they are recognized as. I asked two >>>> bioinformaticians separately: >>>> >>>> >>>> [10:18:59] Stian Soiland-Reyes: What would you call this (type of) >>>> thing? >>>> GATTTTTTTTTTTTTTTACCCACACACACA >>>> [10:35:51] Stian Soiland-Reyes: ignoring finer details such as introns >>>> etc >>>> [10:35:55] Kristina Hettne: a DNA sequence >>>> >>>> >>>> [10:18:56] Stian Soiland-Reyes: What would you call this (type of) >>>> thing? >>>> GATTTTTTTTTTTTTTTACCCACACACACA >>>> [10:19:19] Katy Wolstencroft: a nucleotide sequence >>>> >>>> >>>> So just like you would call "Paris" a city (or the name of a city), >>>> they would identify it as a sequence, and that's the abstraction level >>>> they work on, not on particular molecules inside a cell found inside a >>>> particular organism in this lab. >>>> >>>> >>>> >>>> >>>>> From Content-in-RDF: >>>> >>>> >>>> >>>>> cnt:chars >>>>> The character sequence of the given content. >>>> >>>> >>>> >>>> >>>> So I think there is nothing stopping anyone from doing: >>>> >>>> >>>> <http://example.com/gene/1337> a :NucleotideSequence ; >>>> :sequence "GATTTTTTTTTTACA" . >>>> >>>> :sequence a owl:DatatypeProperty ; >>>> rdfs:subPropertyOf cnt:chars ; >>>> rdfs:domain :NucleotideSequence . >>>> >>>> Their reason for using cnt:chars here could be that a GATC letter >>>> transcription of a genome sequence is the primary representation of >>>> the abstract concept of a nucleotide sequence in the field. >>>> >>>> >>>> >>>> But now I (who we can pretend did not write the above) can't use >>>> <http://example.com/gene/1337> as a OA semantic tag, because it >>>> happens to have an (implied) cnt:chars property, and I would be >>>> seeming to say that the user has tagged "GATTTTTTTTTTACA" as a text. >>>> The example.com guys should not be required to read the OA specs to >>>> prevent this, they just follow Content-in-RDF. >>>> >>>> >>>>> Yes, but that particular plague makes everything practically unusable. >>>>> Does this specific resource have a state? I don't know! How many >>>>> targets are there for the Annotation? I don't know, there could be >>>>> others that I don't know about! Does this Annotation have a body? I >>>>> don't know, please just let me get on with my job! etc. :) >>>> >>>> >>>> >>>> I know, we don't want to go there. However it is one thing to go from >>>> "unspecific to specific" (as in adding state), another to totally >>>> change the semantic "if unspecified, it's X, otherwise it's Y (which >>>> is not Y!)". >>>> >>>> >>>>> <anno1> a oa:Annotation ; >>>>> oa:hasSemanticTag<composite1> ; >>>>> oa:hasTarget<target1> . >>>>> >>>>> <composite1> isn't intended as a semantic tag. But if we allow any >>>>> URI >>>>> to be used as a tag, nothing prevents someone from saying it is. So >>>>> already we have trouble. >>>> >>>> >>>> >>>> Ah, I had not thought about this case. Yes, now oa:hasSemanticTag is >>>> very misleading. So we would have to disallow both Composite and >>>> Specific Resource indirections in my proposal, which would make it >>>> very special case. >>>> >>>>> Here,<textualbody1> is the resource that<semantictag1> was >>>>> extracted >>>>> from. The semantics of Composite are that all of the items are >>>>> required, which is what the publisher wants to convey. >>>>> Except textualbody isn't a tag. Nor is composite1. This is the same >>>>> argument as against a new predicate for literals as bodies. >>>> >>>> >>>> >>>> If you want to annotate that I would propose that as an independent >>>> provenance statement (<composite1>/<anno1> pav:importedFrom >>>> <textualbody1>), and not conflate it into the very same annotation. >>>> >>>> If you are trying to say that the user typed in the<textualbody1> as >>>> an annotation on<target1>, and the system have subsequently found >>>> some semantic tag in the<textualbody1>, then I would try to do the >>>> second step as a second annotation<anno2> with targets both >>>> <textualbody1> and<target1> (with an optional provenance trace of >>>> <anno2> pav:importedFrom<textualbody1> ; pav:derivedFrom<anno1> ) >>>> >>>> >>>>> If there's a solution that allows a mix of body types, I would be >>>>> overjoyed! But I can't see how to do that without introducing any of: >>>>> 1. a node in between (as current spec for documents); 2. a class or >>>>> other property (as current spec for non documents); or 3. a new >>>>> predicate (that gets us in trouble) >>>> >>>> >>>> >>>> I like the suggestion in your next email, which is to subclass/type a >>>> SpecificResource for this purpose. This solves nicely the problems >>>> above, and also avoids introducing a new, independent concept. It >>>> does structurally mean that we have to split or move the Tagging >>>> section. >>>> >>>> Perhaps ; counter to my previous reply - the best solution would be a >>>> split. Let the Tagging section stay where it is - textual tagging is a >>>> quite primary type of annotation we should support at "level 1". >>>> Semantic tagging is a more advanced feature, and can be presented with >>>> the specifiers as a new section 3.6 - a specialization of the level 1 >>>> tagging. The first section will then just say "For semantic tagging; >>>> see section X.X." >>>> >>>> >>> >>> >> >> >> > >
Received on Monday, 4 February 2013 17:19:09 UTC