Re: Serialization spec in the OA core from Robert Sanderson on 2012-08-09 (public-openannotation@w3.org from August 2012)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Thu, 9 Aug 2012 11:20:28 -0600
To: Bob Morris <morris.bob@gmail.com>
Cc: public-openannotation <public-openannotation@w3.org>
Message-ID: <CABevsUGCyvFyOiP30yUSWxZk9f5xNW=Zmz7X=r67YShe4u2i5g@mail.gmail.com>
Hi Bob,

>From my point of view, the restriction in 2.4 is to prevent situations
such as the URI for a web page being used as the identifier for
multiple annotations which are all expressed within it using RDFA, or
getting a huge dump of random stuff with one or more annotations
somewhere inside it.

Basically, if the identifier for an annotation is an HTTP URI, then
doing an HTTP GET on that URI MUST give that annotation's
serialization and no other annotations or content. This is simply
following the Linked Data guidelines.  It does should probably clarify
the use of fragments as allowed, for example
http://www.example.org/index.html#anno1 and #anno2 are different, but
both returned when dereferencing /index.html

I agree that it's always possible to provide URI resolution via a
SPARQL endpoint, but we can't mandate the existence or use of SPARQL.
I agree that according to the wording of RFC 3986, I am conflating
resolution and dereferencing in the current specification text.  If
you'd like to suggest a clearer way to express it, I'm happy to adjust
the wording :)

I also agree that a triplestore based implementation will almost
certainly not return exactly the same graph as was originally
ingested.  That's why we try to use the term "graph" rather than
making reference to the exact set of triples.  We don't want to be
overly prescriptive, but need to say something in order to have the
specification be useful.

Versioning and the dynamic nature of web resources is not something
that we've attempted to solve so far.  I think a separate thread about
this issue would be useful.

Hope that helps,

Rob


On Thu, Aug 9, 2012 at 9:10 AM, Bob Morris <morris.bob@gmail.com> wrote:
> A claim and and a question about the core spec section 2.4
> http://www.openannotation.org/spec/core/#Serialization
>
> Executive summary: How is it determined if an implementation of OA
> complies with Sec 2.4?
>
>
> 1. My claim: If an Annotation can be serialized as some form of RDF,
> and the Annotation has an HTTP URI, then it is \always/ possible to
> provide a resolution and dereference (in the sense of IETF STD 66
> (http://tools.ietf.org/html/rfc3986 ) by use of a SPARQL SELECT query
> submitted to  a SPARQL 1.1 endpoint for delivery of a serialization of
> a graph which somehow "is" the Annotation graph.  This resolution and
> dereference is in no way different from the typical HTTP URI
> resolution and dereference in which the URI is first used as a key to
> a local cache and then used as a parameter to a DNS service request
> which returns an IP address that is then used as a parameter for an
> HTTP protocol request to that IP address.
>
> 2. The question: In the core oa spec, what is the meaning of the
> sentence Sec 2.4 (Serialization
> http://www.openannotation.org/spec/core/#Serialization) "If the
> Annotation has an HTTP URI, then when that URI is dereferenced, the
> Annotation's serialized graph, and only the graph, MUST be returned in
> an appropriate graph serialization format. " ?
>
> As to 1, by resolution and hence dereference, STD 66 does not mean
> that an HTTP URI must be the value of the argument to an HTTP GET request.
> Indeed, as above, it rarely is.  It only means that some provision is
> made to determine and apply an access mechanism:
>     'URI "resolution" is the process of  determining an access
>     mechanism and the appropriate parameters  necessary to dereference a
>     URI; this resolution may require several   iterations.  To use that
>     access mechanism to perform an action on the URI's resource is to
>     "dereference" the URI.'   (From rfc3986, Sec 1.2.2).
> Probably my claim is independent of whether the URI is an HTTP URI or
> not, and probably so is the scheme of using a SPARQL endpoint to
> provide dereferencing.
>
>
> As to 2.,  The sentence in the spec is silent on the nature of the
> resolution (in the sense of rfc 3986) of the URI before dereferencing.
> Hence  to be compliant, the above SPARQL-based resolution, the only
> requirement is that the SPARQL query, when launched return the Annotation's
> serialized graph, and only that.  My problem is: by what criteria do I
> measure whether this dereference returns "the Annotation's serialized
> graph and only the graph"? In other words, how do I test compliance?
> Perhaps this is really a question for an implementation more than the
> spec. But doesn't
> it still raise the question of how to decide that the implementation meets the
> spec as to the "MUST"s in Sec 2.4?
> As Stian has remarked elsewhere, and as is true even for vanilla HTTP
> URI resolution, there is no expectation that the same resolution and
> dereferencing will today produce the same graph as it did yesterday.
>
> One possibility is that what's meant by returning "the Annotation's
> serialized graph and only the graph" is that the dereferncing  returns
> a rooted labelled digraph whose
> root has an edge labelled rdf:type and taget node labelled
> oa:Annotation and that root label is a  URI that is
> the same as the "original" . Alas, just about anything could be such a
> graph. Do we
> instead mean that at the time of creation of the Annotation, the graph
> is in some way immutably  persisted somewhere (a triple store? I think
> not...)  and this "original"  is  graph-theoretically compared to the
> dereference? What comparison would be useful if the SPARQL endpoint
> supports adding assertions about resources referenced in the
> Annotation--especially resources in the oa namespace, such as the
> addition of another Target?
>
> The above SPARQL mechanism is important to us because we require to
> support a semantic pub/sub Annotation system.  Using a SPARQL query
> (usually behind the scenes), users explicitly  express a semantic
> interest in what kind of annotation about whose creation and
> publication they wish to be notified.  We use SPARQLPuSH
> (code.google.com/p/sparqlpush) to manage the subscriptions and
> notifications. We use Jena Fuseki to provide the SPARQL endpoints.
> Fuseki even supports invoking SPARQL queries via HTTP GET calls, so
> the actual resolution can be to something which is even an HTTP URI,
> though f rfc 3986 does not require that resolution be a mapping of one
> HTTP URI to another.
>
> --
> Robert A. Morris
>
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
>
> IT Staff
> Filtered Push Project
> Harvard University Herbaria
> Harvard University
>
> email: morris.bob@gmail.com
> web: http://efg.cs.umb.edu/
> web: http://etaxonomy.org/mw/FilteredPush
> http://www.cs.umb.edu/~ram
> ===
> The content of this communication is made entirely on my
> own behalf and in no way should be deemed to express
> official positions of The University of Massachusetts at Boston or
> Harvard University.
>
Received on Thursday, 9 August 2012 17:20:57 UTC