Re: Last Ultimate Final Call :) from Antoine Isaac on 2013-02-04 (public-openannotation@w3.org from February 2013)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Mon, 4 Feb 2013 13:59:53 +0100
To: <public-openannotation@w3.org>
Message-ID: <510FB0C9.4010907@few.vu.nl>
Hi Stian,

Indeed there's not much way CNT could constrain the use of cnt:chars, maybe it's difficult to write a formal spec of what would qualify as "content" in an RDF environment. It just requires that users would "get it right"--just as many other elements in OA or elsewhere (OA motivations, for a start).

Now, if we don't trust CNT to be used right, nothing prevents us from coining a new (sub)property to replace cnt:chars.

Antoine


> I know this is taking it a bit of on an edge. I am primarily just
> worried about having implied semantics based on the presence or not of
> a property which is not even ours.  That such usage would mainly sound
> stupid in the examples we make up, they are not disallowed by other
> specifications, and I don't think we can mandate how other
> vocabularies should be used on non-OA resources.
>
>
> On Mon, Feb 4, 2013 at 11:16 AM, Antoine Isaac<aisaac@few.vu.nl>  wrote:
>> Hi Stian,
>>
>> All this is leading us into deep ontological thinking...
>> The baseline is that Content in RDF is for "Content", ie. just encoding of
>> stuff, the content of a file. When somebody with no knowledge of biology
>> types "GATTTTTTTTTTACA" it's not a nucleotide sequence, it's a string. The T
>> there has as much semantics as the t in "Stian".
>>
>> Even if a nucleotide sequence may not need to refer to molecules to be
>> operational, bioinformaticians still assume something more than a string of
>> literals. You're expected to do something with it that has certain
>> semantics, even if they are low-level: ie., the main splitting level is the
>> one of individual symbols (letters), you can't have an X in it, etc.
>>
>> As you say the string represents the sequence, and that still hints at a
>> quite important difference in level. the value of cnt:chars does not
>> represent content, it is the content.
>>
>> Antoine
>>
>>
>>
>>> On Fri, Feb 1, 2013 at 5:18 PM, Robert Sanderson<azaroth42@gmail.com>
>>> wrote:
>>>
>>>> http://dbpedia.org/resource/Paris doesn't identify a document, so
>>>> there's no confusion as to whether to dereference it or not.
>>>
>>>
>>> No, here we are lucky in that dbpedia.org is playing by the rules.
>>>
>>>> Using documents as *semantic* tags is simply bad modeling.  Do you
>>>> mean the document or the semantic concept (eg my home page or me).
>>>> Surely this has been discussed long enough in other contexts that we
>>>> don't have to rehash it here?
>>>
>>>
>>> Of course. I am not saying that it is not bad modelling. I am just
>>> trying to say you would find this in the wild, and it would not be
>>> against the current specifications for HTTP, HTML, RDF, etc.
>>>
>>> In particular you would find hash-URIs like
>>> <http://example.com/aDocument.rdf#concept>   - now is that covered by
>>> not recommended "the URI of a document"? That is unclear by the
>>> current wording.
>>>
>>> Also you would find examples like<http://omim.org/entry/104760>   by
>>> Paolo, of course here the omim.org site is 'innocent' in that they
>>> never intended to mint a semantic concept. That should not preclude
>>> users of OA to use it as such.
>>>
>>>> But to assert that a non information resource, the city of Paris, has
>>>> content is clearly wrong.
>>>
>>>
>>> I agree that would be silly for Paris. But we don't know what other
>>> users of other concepts have done using Content-in-RDF, which is
>>> another specification. There is nothing in the Content-in-RDF spec
>>> that would not allow it to be used such. cnt:Content does not mandate
>>> that the resource is an infoamrtion resource.
>>>
>>>> The cnt:Content class is an overarching class for any content that could
>>>> be found on the Web, in an Intranet or in local storage media, for example.
>>>> It is recommended always to use one of its subclasses. There is no
>>>> restriction within the vocabulary scope on what can be represented with this
>>>> class: textual content, XML files, binary files (e.g., images or movies),
>>>> etc.
>>>
>>>
>>>
>>>
>>>>> For instance,
>>>>> semantic tags identifying genome sequences might very well be
>>>>> including the actual genome sequence (like "GATTATTATATATATAGATTACA"
>>>>> as cnt:chars.
>>>>
>>>> And that too would be wrong.  The biological genome in the real world
>>>> does not contain a string of characters in UTF-8 like that.
>>>
>>>
>>> No, but they are commonly represented as such.  Just like a person's
>>> name is not a string of characters in UTF-8. A nucleotide sequence is
>>> the primary representation that they are recognized as. I asked two
>>> bioinformaticians separately:
>>>
>>>
>>> [10:18:59] Stian Soiland-Reyes: What would you call this (type of) thing?
>>> GATTTTTTTTTTTTTTTACCCACACACACA
>>> [10:35:51] Stian Soiland-Reyes: ignoring finer details such as introns etc
>>> [10:35:55] Kristina Hettne: a DNA sequence
>>>
>>>
>>> [10:18:56] Stian Soiland-Reyes: What would you call this (type of) thing?
>>> GATTTTTTTTTTTTTTTACCCACACACACA
>>> [10:19:19] Katy Wolstencroft: a nucleotide sequence
>>>
>>>
>>> So just like you would call "Paris" a city (or the name of a city),
>>> they would identify it as a sequence, and that's the abstraction level
>>> they work on, not on particular molecules inside a cell found inside a
>>> particular organism in this lab.
>>>
>>>
>>>
>>>
>>>>  From Content-in-RDF:
>>>
>>>
>>>> cnt:chars
>>>> The character sequence of the given content.
>>>
>>>
>>>
>>> So I think there is nothing stopping anyone from doing:
>>>
>>>
>>> <http://example.com/gene/1337>   a :NucleotideSequence ;
>>>       :sequence "GATTTTTTTTTTACA" .
>>>
>>> :sequence a owl:DatatypeProperty ;
>>>       rdfs:subPropertyOf cnt:chars ;
>>>       rdfs:domain :NucleotideSequence .
>>>
>>> Their reason for using cnt:chars here could be that a GATC letter
>>> transcription of a genome sequence is the primary representation of
>>> the abstract concept of a nucleotide sequence in the field.
>>>
>>>
>>>
>>> But now I (who we can pretend did not write the above) can't use
>>> <http://example.com/gene/1337>   as a OA semantic tag, because it
>>> happens to have an (implied) cnt:chars property, and I would be
>>> seeming to say that the user has tagged "GATTTTTTTTTTACA" as a text.
>>> The example.com guys should not be required to read the OA specs to
>>> prevent this, they just follow Content-in-RDF.
>>>
>>>
>>>> Yes, but that particular plague makes everything practically unusable.
>>>>    Does this specific resource have a state? I don't know! How many
>>>> targets are there for the Annotation? I don't know, there could be
>>>> others that I don't know about! Does this Annotation have a body? I
>>>> don't know, please just let me get on with my job! etc. :)
>>>
>>>
>>> I know, we don't want to go there. However it is one thing to go from
>>> "unspecific to specific" (as in adding state), another to totally
>>> change the semantic "if unspecified, it's X, otherwise it's Y (which
>>> is not Y!)".
>>>
>>>
>>>> <anno1>   a oa:Annotation ;
>>>>     oa:hasSemanticTag<composite1>   ;
>>>>     oa:hasTarget<target1>   .
>>>>
>>>> <composite1>   isn't intended as a semantic tag. But if we allow any URI
>>>> to be used as a tag, nothing prevents someone from saying it is. So
>>>> already we have trouble.
>>>
>>>
>>> Ah, I had not thought about this case. Yes, now oa:hasSemanticTag is
>>> very misleading. So we would have to disallow both Composite and
>>> Specific Resource indirections in my proposal, which would make it
>>> very special case.
>>>
>>>> Here,<textualbody1>   is the resource that<semantictag1>   was extracted
>>>> from.  The semantics of Composite are that all of the items are
>>>> required, which is what the publisher wants to convey.
>>>> Except textualbody isn't a tag. Nor is composite1.  This is the same
>>>> argument as against a new predicate for literals as bodies.
>>>
>>>
>>> If you want to annotate that I would propose that as an independent
>>> provenance statement (<composite1>/<anno1>    pav:importedFrom
>>> <textualbody1>), and not conflate it into the very same annotation.
>>>
>>> If you are trying to say that the user typed in the<textualbody1>   as
>>> an annotation on<target1>, and the system have subsequently found
>>> some semantic tag in the<textualbody1>, then I would try to do the
>>> second step as a second annotation<anno2>   with targets both
>>> <textualbody1>   and<target1>    (with an optional  provenance trace of
>>> <anno2>   pav:importedFrom<textualbody1>   ;  pav:derivedFrom<anno1>   )
>>>
>>>
>>>> If there's a solution that allows a mix of body types, I would be
>>>> overjoyed!  But I can't see how to do that without introducing any of:
>>>> 1. a node in between (as current spec for documents); 2. a class or
>>>> other property (as current spec for non documents); or 3. a new
>>>> predicate (that gets us in trouble)
>>>
>>>
>>> I like the suggestion in your next email, which is to subclass/type a
>>> SpecificResource for this purpose. This solves nicely the problems
>>> above, and also avoids introducing a new, independent concept.  It
>>> does structurally mean that we have to split or move the Tagging
>>> section.
>>>
>>> Perhaps ; counter to my previous reply - the best solution would be a
>>> split. Let the Tagging section stay where it is - textual tagging is a
>>> quite primary type of annotation we should support at "level 1".
>>> Semantic tagging is a more advanced feature, and can be presented with
>>> the specifiers as a new section 3.6 - a specialization of the level 1
>>> tagging.  The first section will then just say "For semantic tagging;
>>> see section X.X."
>>>
>>>
>>
>>
>
>
>
Received on Monday, 4 February 2013 13:00:23 UTC