Re: Last Ultimate Final Call :)

I know this is taking it a bit of on an edge. I am primarily just
worried about having implied semantics based on the presence or not of
a property which is not even ours.  That such usage would mainly sound
stupid in the examples we make up, they are not disallowed by other
specifications, and I don't think we can mandate how other
vocabularies should be used on non-OA resources.


On Mon, Feb 4, 2013 at 11:16 AM, Antoine Isaac <aisaac@few.vu.nl> wrote:
> Hi Stian,
>
> All this is leading us into deep ontological thinking...
> The baseline is that Content in RDF is for "Content", ie. just encoding of
> stuff, the content of a file. When somebody with no knowledge of biology
> types "GATTTTTTTTTTACA" it's not a nucleotide sequence, it's a string. The T
> there has as much semantics as the t in "Stian".
>
> Even if a nucleotide sequence may not need to refer to molecules to be
> operational, bioinformaticians still assume something more than a string of
> literals. You're expected to do something with it that has certain
> semantics, even if they are low-level: ie., the main splitting level is the
> one of individual symbols (letters), you can't have an X in it, etc.
>
> As you say the string represents the sequence, and that still hints at a
> quite important difference in level. the value of cnt:chars does not
> represent content, it is the content.
>
> Antoine
>
>
>
>> On Fri, Feb 1, 2013 at 5:18 PM, Robert Sanderson<azaroth42@gmail.com>
>> wrote:
>>
>>> http://dbpedia.org/resource/Paris doesn't identify a document, so
>>> there's no confusion as to whether to dereference it or not.
>>
>>
>> No, here we are lucky in that dbpedia.org is playing by the rules.
>>
>>> Using documents as *semantic* tags is simply bad modeling.  Do you
>>> mean the document or the semantic concept (eg my home page or me).
>>> Surely this has been discussed long enough in other contexts that we
>>> don't have to rehash it here?
>>
>>
>> Of course. I am not saying that it is not bad modelling. I am just
>> trying to say you would find this in the wild, and it would not be
>> against the current specifications for HTTP, HTML, RDF, etc.
>>
>> In particular you would find hash-URIs like
>> <http://example.com/aDocument.rdf#concept>  - now is that covered by
>> not recommended "the URI of a document"? That is unclear by the
>> current wording.
>>
>> Also you would find examples like<http://omim.org/entry/104760>  by
>> Paolo, of course here the omim.org site is 'innocent' in that they
>> never intended to mint a semantic concept. That should not preclude
>> users of OA to use it as such.
>>
>>> But to assert that a non information resource, the city of Paris, has
>>> content is clearly wrong.
>>
>>
>> I agree that would be silly for Paris. But we don't know what other
>> users of other concepts have done using Content-in-RDF, which is
>> another specification. There is nothing in the Content-in-RDF spec
>> that would not allow it to be used such. cnt:Content does not mandate
>> that the resource is an infoamrtion resource.
>>
>>> The cnt:Content class is an overarching class for any content that could
>>> be found on the Web, in an Intranet or in local storage media, for example.
>>> It is recommended always to use one of its subclasses. There is no
>>> restriction within the vocabulary scope on what can be represented with this
>>> class: textual content, XML files, binary files (e.g., images or movies),
>>> etc.
>>
>>
>>
>>
>>>> For instance,
>>>> semantic tags identifying genome sequences might very well be
>>>> including the actual genome sequence (like "GATTATTATATATATAGATTACA"
>>>> as cnt:chars.
>>>
>>> And that too would be wrong.  The biological genome in the real world
>>> does not contain a string of characters in UTF-8 like that.
>>
>>
>> No, but they are commonly represented as such.  Just like a person's
>> name is not a string of characters in UTF-8. A nucleotide sequence is
>> the primary representation that they are recognized as. I asked two
>> bioinformaticians separately:
>>
>>
>> [10:18:59] Stian Soiland-Reyes: What would you call this (type of) thing?
>> GATTTTTTTTTTTTTTTACCCACACACACA
>> [10:35:51] Stian Soiland-Reyes: ignoring finer details such as introns etc
>> [10:35:55] Kristina Hettne: a DNA sequence
>>
>>
>> [10:18:56] Stian Soiland-Reyes: What would you call this (type of) thing?
>> GATTTTTTTTTTTTTTTACCCACACACACA
>> [10:19:19] Katy Wolstencroft: a nucleotide sequence
>>
>>
>> So just like you would call "Paris" a city (or the name of a city),
>> they would identify it as a sequence, and that's the abstraction level
>> they work on, not on particular molecules inside a cell found inside a
>> particular organism in this lab.
>>
>>
>>
>>
>>> From Content-in-RDF:
>>
>>
>>> cnt:chars
>>> The character sequence of the given content.
>>
>>
>>
>> So I think there is nothing stopping anyone from doing:
>>
>>
>> <http://example.com/gene/1337>  a :NucleotideSequence ;
>>      :sequence "GATTTTTTTTTTACA" .
>>
>> :sequence a owl:DatatypeProperty ;
>>      rdfs:subPropertyOf cnt:chars ;
>>      rdfs:domain :NucleotideSequence .
>>
>> Their reason for using cnt:chars here could be that a GATC letter
>> transcription of a genome sequence is the primary representation of
>> the abstract concept of a nucleotide sequence in the field.
>>
>>
>>
>> But now I (who we can pretend did not write the above) can't use
>> <http://example.com/gene/1337>  as a OA semantic tag, because it
>> happens to have an (implied) cnt:chars property, and I would be
>> seeming to say that the user has tagged "GATTTTTTTTTTACA" as a text.
>> The example.com guys should not be required to read the OA specs to
>> prevent this, they just follow Content-in-RDF.
>>
>>
>>> Yes, but that particular plague makes everything practically unusable.
>>>   Does this specific resource have a state? I don't know! How many
>>> targets are there for the Annotation? I don't know, there could be
>>> others that I don't know about! Does this Annotation have a body? I
>>> don't know, please just let me get on with my job! etc. :)
>>
>>
>> I know, we don't want to go there. However it is one thing to go from
>> "unspecific to specific" (as in adding state), another to totally
>> change the semantic "if unspecified, it's X, otherwise it's Y (which
>> is not Y!)".
>>
>>
>>> <anno1>  a oa:Annotation ;
>>>    oa:hasSemanticTag<composite1>  ;
>>>    oa:hasTarget<target1>  .
>>>
>>> <composite1>  isn't intended as a semantic tag. But if we allow any URI
>>> to be used as a tag, nothing prevents someone from saying it is. So
>>> already we have trouble.
>>
>>
>> Ah, I had not thought about this case. Yes, now oa:hasSemanticTag is
>> very misleading. So we would have to disallow both Composite and
>> Specific Resource indirections in my proposal, which would make it
>> very special case.
>>
>>> Here,<textualbody1>  is the resource that<semantictag1>  was extracted
>>> from.  The semantics of Composite are that all of the items are
>>> required, which is what the publisher wants to convey.
>>> Except textualbody isn't a tag. Nor is composite1.  This is the same
>>> argument as against a new predicate for literals as bodies.
>>
>>
>> If you want to annotate that I would propose that as an independent
>> provenance statement (<composite1>/<anno1>   pav:importedFrom
>> <textualbody1>), and not conflate it into the very same annotation.
>>
>> If you are trying to say that the user typed in the<textualbody1>  as
>> an annotation on<target1>, and the system have subsequently found
>> some semantic tag in the<textualbody1>, then I would try to do the
>> second step as a second annotation<anno2>  with targets both
>> <textualbody1>  and<target1>   (with an optional  provenance trace of
>> <anno2>  pav:importedFrom<textualbody1>  ;  pav:derivedFrom<anno1>  )
>>
>>
>>> If there's a solution that allows a mix of body types, I would be
>>> overjoyed!  But I can't see how to do that without introducing any of:
>>> 1. a node in between (as current spec for documents); 2. a class or
>>> other property (as current spec for non documents); or 3. a new
>>> predicate (that gets us in trouble)
>>
>>
>> I like the suggestion in your next email, which is to subclass/type a
>> SpecificResource for this purpose. This solves nicely the problems
>> above, and also avoids introducing a new, independent concept.  It
>> does structurally mean that we have to split or move the Tagging
>> section.
>>
>> Perhaps ; counter to my previous reply - the best solution would be a
>> split. Let the Tagging section stay where it is - textual tagging is a
>> quite primary type of annotation we should support at "level 1".
>> Semantic tagging is a more advanced feature, and can be presented with
>> the specifiers as a new section 3.6 - a specialization of the level 1
>> tagging.  The first section will then just say "For semantic tagging;
>> see section X.X."
>>
>>
>
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Monday, 4 February 2013 12:13:51 UTC