Re: Last Ultimate Final Call :)

Hi Stian,

All this is leading us into deep ontological thinking...
The baseline is that Content in RDF is for "Content", ie. just encoding of stuff, the content of a file. When somebody with no knowledge of biology types "GATTTTTTTTTTACA" it's not a nucleotide sequence, it's a string. The T there has as much semantics as the t in "Stian".

Even if a nucleotide sequence may not need to refer to molecules to be operational, bioinformaticians still assume something more than a string of literals. You're expected to do something with it that has certain semantics, even if they are low-level: ie., the main splitting level is the one of individual symbols (letters), you can't have an X in it, etc.

As you say the string represents the sequence, and that still hints at a quite important difference in level. the value of cnt:chars does not represent content, it is the content.

Antoine


> On Fri, Feb 1, 2013 at 5:18 PM, Robert Sanderson<azaroth42@gmail.com>  wrote:
>
>> http://dbpedia.org/resource/Paris doesn't identify a document, so
>> there's no confusion as to whether to dereference it or not.
>
> No, here we are lucky in that dbpedia.org is playing by the rules.
>
>> Using documents as *semantic* tags is simply bad modeling.  Do you
>> mean the document or the semantic concept (eg my home page or me).
>> Surely this has been discussed long enough in other contexts that we
>> don't have to rehash it here?
>
> Of course. I am not saying that it is not bad modelling. I am just
> trying to say you would find this in the wild, and it would not be
> against the current specifications for HTTP, HTML, RDF, etc.
>
> In particular you would find hash-URIs like
> <http://example.com/aDocument.rdf#concept>  - now is that covered by
> not recommended "the URI of a document"? That is unclear by the
> current wording.
>
> Also you would find examples like<http://omim.org/entry/104760>  by
> Paolo, of course here the omim.org site is 'innocent' in that they
> never intended to mint a semantic concept. That should not preclude
> users of OA to use it as such.
>
>> But to assert that a non information resource, the city of Paris, has
>> content is clearly wrong.
>
> I agree that would be silly for Paris. But we don't know what other
> users of other concepts have done using Content-in-RDF, which is
> another specification. There is nothing in the Content-in-RDF spec
> that would not allow it to be used such. cnt:Content does not mandate
> that the resource is an infoamrtion resource.
>
>> The cnt:Content class is an overarching class for any content that could be found on the Web, in an Intranet or in local storage media, for example. It is recommended always to use one of its subclasses. There is no restriction within the vocabulary scope on what can be represented with this class: textual content, XML files, binary files (e.g., images or movies), etc.
>
>
>
>>> For instance,
>>> semantic tags identifying genome sequences might very well be
>>> including the actual genome sequence (like "GATTATTATATATATAGATTACA"
>>> as cnt:chars.
>> And that too would be wrong.  The biological genome in the real world
>> does not contain a string of characters in UTF-8 like that.
>
> No, but they are commonly represented as such.  Just like a person's
> name is not a string of characters in UTF-8. A nucleotide sequence is
> the primary representation that they are recognized as. I asked two
> bioinformaticians separately:
>
>
> [10:18:59] Stian Soiland-Reyes: What would you call this (type of) thing?
> GATTTTTTTTTTTTTTTACCCACACACACA
> [10:35:51] Stian Soiland-Reyes: ignoring finer details such as introns etc
> [10:35:55] Kristina Hettne: a DNA sequence
>
>
> [10:18:56] Stian Soiland-Reyes: What would you call this (type of) thing?
> GATTTTTTTTTTTTTTTACCCACACACACA
> [10:19:19] Katy Wolstencroft: a nucleotide sequence
>
>
> So just like you would call "Paris" a city (or the name of a city),
> they would identify it as a sequence, and that's the abstraction level
> they work on, not on particular molecules inside a cell found inside a
> particular organism in this lab.
>
>
>
>
>> From Content-in-RDF:
>
>> cnt:chars
>> The character sequence of the given content.
>
>
> So I think there is nothing stopping anyone from doing:
>
>
> <http://example.com/gene/1337>  a :NucleotideSequence ;
>      :sequence "GATTTTTTTTTTACA" .
>
> :sequence a owl:DatatypeProperty ;
>      rdfs:subPropertyOf cnt:chars ;
>      rdfs:domain :NucleotideSequence .
>
> Their reason for using cnt:chars here could be that a GATC letter
> transcription of a genome sequence is the primary representation of
> the abstract concept of a nucleotide sequence in the field.
>
>
>
> But now I (who we can pretend did not write the above) can't use
> <http://example.com/gene/1337>  as a OA semantic tag, because it
> happens to have an (implied) cnt:chars property, and I would be
> seeming to say that the user has tagged "GATTTTTTTTTTACA" as a text.
> The example.com guys should not be required to read the OA specs to
> prevent this, they just follow Content-in-RDF.
>
>
>> Yes, but that particular plague makes everything practically unusable.
>>   Does this specific resource have a state? I don't know! How many
>> targets are there for the Annotation? I don't know, there could be
>> others that I don't know about! Does this Annotation have a body? I
>> don't know, please just let me get on with my job! etc. :)
>
> I know, we don't want to go there. However it is one thing to go from
> "unspecific to specific" (as in adding state), another to totally
> change the semantic "if unspecified, it's X, otherwise it's Y (which
> is not Y!)".
>
>
>> <anno1>  a oa:Annotation ;
>>    oa:hasSemanticTag<composite1>  ;
>>    oa:hasTarget<target1>  .
>>
>> <composite1>  isn't intended as a semantic tag. But if we allow any URI
>> to be used as a tag, nothing prevents someone from saying it is. So
>> already we have trouble.
>
> Ah, I had not thought about this case. Yes, now oa:hasSemanticTag is
> very misleading. So we would have to disallow both Composite and
> Specific Resource indirections in my proposal, which would make it
> very special case.
>
>> Here,<textualbody1>  is the resource that<semantictag1>  was extracted
>> from.  The semantics of Composite are that all of the items are
>> required, which is what the publisher wants to convey.
>> Except textualbody isn't a tag. Nor is composite1.  This is the same
>> argument as against a new predicate for literals as bodies.
>
> If you want to annotate that I would propose that as an independent
> provenance statement (<composite1>/<anno1>   pav:importedFrom
> <textualbody1>), and not conflate it into the very same annotation.
>
> If you are trying to say that the user typed in the<textualbody1>  as
> an annotation on<target1>, and the system have subsequently found
> some semantic tag in the<textualbody1>, then I would try to do the
> second step as a second annotation<anno2>  with targets both
> <textualbody1>  and<target1>   (with an optional  provenance trace of
> <anno2>  pav:importedFrom<textualbody1>  ;  pav:derivedFrom<anno1>  )
>
>
>> If there's a solution that allows a mix of body types, I would be
>> overjoyed!  But I can't see how to do that without introducing any of:
>> 1. a node in between (as current spec for documents); 2. a class or
>> other property (as current spec for non documents); or 3. a new
>> predicate (that gets us in trouble)
>
> I like the suggestion in your next email, which is to subclass/type a
> SpecificResource for this purpose. This solves nicely the problems
> above, and also avoids introducing a new, independent concept.  It
> does structurally mean that we have to split or move the Tagging
> section.
>
> Perhaps ; counter to my previous reply - the best solution would be a
> split. Let the Tagging section stay where it is - textual tagging is a
> quite primary type of annotation we should support at "level 1".
> Semantic tagging is a more advanced feature, and can be presented with
> the specifiers as a new section 3.6 - a specialization of the level 1
> tagging.  The first section will then just say "For semantic tagging;
> see section X.X."
>
>

Received on Monday, 4 February 2013 11:17:24 UTC