Re: Last Ultimate Final Call :) from Stian Soiland-Reyes on 2013-02-04 (public-openannotation@w3.org from February 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Mon, 4 Feb 2013 10:59:48 +0000
To: Robert Sanderson <azaroth42@gmail.com>
Cc: public-openannotation <public-openannotation@w3.org>
Message-ID: <CAPRnXtngK-BmvPer8TaZf9g8dpji=d2Sz72hAuB4OWrCOW87oA@mail.gmail.com>
On Fri, Feb 1, 2013 at 5:18 PM, Robert Sanderson <azaroth42@gmail.com> wrote:

> http://dbpedia.org/resource/Paris doesn't identify a document, so
> there's no confusion as to whether to dereference it or not.

No, here we are lucky in that dbpedia.org is playing by the rules.

> Using documents as *semantic* tags is simply bad modeling.  Do you
> mean the document or the semantic concept (eg my home page or me).
> Surely this has been discussed long enough in other contexts that we
> don't have to rehash it here?

Of course. I am not saying that it is not bad modelling. I am just
trying to say you would find this in the wild, and it would not be
against the current specifications for HTTP, HTML, RDF, etc.

In particular you would find hash-URIs like
<http://example.com/aDocument.rdf#concept> - now is that covered by
not recommended "the URI of a document"? That is unclear by the
current wording.

Also you would find examples like <http://omim.org/entry/104760> by
Paolo, of course here the omim.org site is 'innocent' in that they
never intended to mint a semantic concept. That should not preclude
users of OA to use it as such.

> But to assert that a non information resource, the city of Paris, has
> content is clearly wrong.

I agree that would be silly for Paris. But we don't know what other
users of other concepts have done using Content-in-RDF, which is
another specification. There is nothing in the Content-in-RDF spec
that would not allow it to be used such. cnt:Content does not mandate
that the resource is an infoamrtion resource.

> The cnt:Content class is an overarching class for any content that could be found on the Web, in an Intranet or in local storage media, for example. It is recommended always to use one of its subclasses. There is no restriction within the vocabulary scope on what can be represented with this class: textual content, XML files, binary files (e.g., images or movies), etc.



>> For instance,
>> semantic tags identifying genome sequences might very well be
>> including the actual genome sequence (like "GATTATTATATATATAGATTACA"
>> as cnt:chars.
> And that too would be wrong.  The biological genome in the real world
> does not contain a string of characters in UTF-8 like that.

No, but they are commonly represented as such.  Just like a person's
name is not a string of characters in UTF-8. A nucleotide sequence is
the primary representation that they are recognized as. I asked two
bioinformaticians separately:


[10:18:59] Stian Soiland-Reyes: What would you call this (type of) thing?
GATTTTTTTTTTTTTTTACCCACACACACA
[10:35:51] Stian Soiland-Reyes: ignoring finer details such as introns etc
[10:35:55] Kristina Hettne: a DNA sequence


[10:18:56] Stian Soiland-Reyes: What would you call this (type of) thing?
GATTTTTTTTTTTTTTTACCCACACACACA
[10:19:19] Katy Wolstencroft: a nucleotide sequence


So just like you would call "Paris" a city (or the name of a city),
they would identify it as a sequence, and that's the abstraction level
they work on, not on particular molecules inside a cell found inside a
particular organism in this lab.




>From Content-in-RDF:

> cnt:chars
> The character sequence of the given content.


So I think there is nothing stopping anyone from doing:


<http://example.com/gene/1337> a :NucleotideSequence ;
    :sequence "GATTTTTTTTTTACA" .

:sequence a owl:DatatypeProperty ;
    rdfs:subPropertyOf cnt:chars ;
    rdfs:domain :NucleotideSequence .

Their reason for using cnt:chars here could be that a GATC letter
transcription of a genome sequence is the primary representation of
the abstract concept of a nucleotide sequence in the field.



But now I (who we can pretend did not write the above) can't use
<http://example.com/gene/1337> as a OA semantic tag, because it
happens to have an (implied) cnt:chars property, and I would be
seeming to say that the user has tagged "GATTTTTTTTTTACA" as a text.
The example.com guys should not be required to read the OA specs to
prevent this, they just follow Content-in-RDF.


> Yes, but that particular plague makes everything practically unusable.
>  Does this specific resource have a state? I don't know! How many
> targets are there for the Annotation? I don't know, there could be
> others that I don't know about! Does this Annotation have a body? I
> don't know, please just let me get on with my job! etc. :)

I know, we don't want to go there. However it is one thing to go from
"unspecific to specific" (as in adding state), another to totally
change the semantic "if unspecified, it's X, otherwise it's Y (which
is not Y!)".


> <anno1> a oa:Annotation ;
>   oa:hasSemanticTag <composite1> ;
>   oa:hasTarget <target1> .
>
> <composite1> isn't intended as a semantic tag. But if we allow any URI
> to be used as a tag, nothing prevents someone from saying it is. So
> already we have trouble.

Ah, I had not thought about this case. Yes, now oa:hasSemanticTag is
very misleading. So we would have to disallow both Composite and
Specific Resource indirections in my proposal, which would make it
very special case.

> Here, <textualbody1> is the resource that <semantictag1> was extracted
> from.  The semantics of Composite are that all of the items are
> required, which is what the publisher wants to convey.
> Except textualbody isn't a tag. Nor is composite1.  This is the same
> argument as against a new predicate for literals as bodies.

If you want to annotate that I would propose that as an independent
provenance statement (<composite1>/<anno1>  pav:importedFrom
<textualbody1>), and not conflate it into the very same annotation.

If you are trying to say that the user typed in the <textualbody1> as
an annotation on <target1>, and the system have subsequently found
some semantic tag in the <textualbody1>, then I would try to do the
second step as a second annotation <anno2> with targets both
<textualbody1> and <target1>  (with an optional  provenance trace of
<anno2> pav:importedFrom <textualbody1> ;  pav:derivedFrom <anno1> )


> If there's a solution that allows a mix of body types, I would be
> overjoyed!  But I can't see how to do that without introducing any of:
> 1. a node in between (as current spec for documents); 2. a class or
> other property (as current spec for non documents); or 3. a new
> predicate (that gets us in trouble)

I like the suggestion in your next email, which is to subclass/type a
SpecificResource for this purpose. This solves nicely the problems
above, and also avoids introducing a new, independent concept.  It
does structurally mean that we have to split or move the Tagging
section.

Perhaps ; counter to my previous reply - the best solution would be a
split. Let the Tagging section stay where it is - textual tagging is a
quite primary type of annotation we should support at "level 1".
Semantic tagging is a more advanced feature, and can be presented with
the specifiers as a new section 3.6 - a specialization of the level 1
tagging.  The first section will then just say "For semantic tagging;
see section X.X."


-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Monday, 4 February 2013 11:00:43 UTC