Re: URI fragments from Robert Sanderson on 2012-10-04 (public-openannotation@w3.org from October 2012)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Thu, 4 Oct 2012 09:23:33 -0600
To: Paolo Ciccarese <paolo.ciccarese@gmail.com>
Cc: Nick White <nick.white@durham.ac.uk>, public-openannotation@w3.org
Message-ID: <CABevsUFSYCSY_yb4NCL+nk1Vdp0YiwUOnWYhzTjrMPE-qxuXAA@mail.gmail.com>
Hi Nick, and all,

There are two possibilities, listed below, for annotating parts of
resources.  We decided on the more expressive but more verbose second
option for a variety of reasons, that I'll try to unpack a little bit
from the specification.

Option (1) Fragment URI as target
_:anno a oa:Annotation ;
  oa:hasTarget <http://www.example.com/example.ogv#t=10,20> ;
  oa:hasBody <http://www.example.org/comment1> .


Option (2) Identity as target, description as a Selector
_:anno a oa:Annotation ;
  oa:hasTarget <SpTarget1> ;
  oa:hasBody <http://www.example.org/comment1> .

<SpTarget1> a oa:SpecificResource ;
  oa:hasSelector <FragSel1> ;
  oa:hasSource <http://www.example.com/example.ogv> .

<FragSel1> a oa:FragmentSelector ;
  rdf:value "t=10,20" .

-------

To try and decompress some of the reasoning in the specification:
( http://www.openannotation.org/spec/core/#SelectorFragment )

* You can't search for http://www.example.com/example.ogv directly in
the first model.  Remember that URIs, for the purposes of Sparql etc
are opaque, non-decomposable strings.  Regardless of whether that
string may include human readable semantics or not, you can't discover
annotations of form (1) and you can discover them in form (2) by
querying the object of oa:hasSource.

* While Style specifiers are going away, form (1) is not compatible
with States.  If you need to refer to example.ogv at a particular
point in time then you need a State which cannot be attached to the
Fragment URI, as that would break the global scope of statements in
RDF.  In other words, you would be saying that for all uses of that
time range within the video, it was always based on the video resource
as it was at a particular point in time (say at 2011-05-20), which
would prevent other annotations from having a different point in time
for that same time segment within the video.

* URIs provide identity. A URI with a fragment provides both identity
for the segment, but also a description of how to resolve that segment
given a (particular) representation of the resource. This has several
issues:
(a) There may be many ways to describe the same segment, used by
different communities.
(b) IETF Fragment specifications are tied to a specific mimetype.
Your example of plain text fragments works only for text/plain
resources, and no other. Thus the identity of the segment is tied to a
specific representation in a specific format
(c) As Jeni Tennison points out, the same fragment can identify
different things within the same resource.  She gives the example of
an SVG document with embedded RDF in Appendix B. (
http://www.w3.org/TR/2012/WD-fragid-best-practices-20120726/ )
So we think it's safer to create a new node in the graph to provide
identity, and a separate node to provide the description.  This
safety, expressiveness and consistency comes at the expense of some
extra bytes, but that's RDF for you.

* Fragment URIs are not expressive enough to cover the use cases that
drive the Open Annotation specification.  Non rectangular sections of
images are very important to be able to identify and describe,
including simple circles as well as arbitrary paths.  The worst case
is annotating a diagonal road in a map from top left to bottom right
of the image, where a rectangular box would encompass the whole
image's content. Thus we need some other way to implement this, which
results in form (2)

* Fragment URIs do not cover all media types, nor could they possibly
hope to.  If you wanted to annotate a selection of text within a MS
Word document, you would need Microsoft to register a fragment
description for .doc and .docx.  Given that this is a useful sort of
thing to do, and it can't be done with fragment URIs, we need a
selector concept as in form (2)

* They are not extensible, and especially Media Fragment URIs which we
lobbied hard for before the specification was finalized, but to no
avail.  As soon as anyone needed something slightly richer or more
expressive ... like a circular area rather than rectangular ... then
we would be back in the situation where we needed a selector again.
If they were extensible, this would have helped.

* Thus, there is a need for a Selector that describes the segment of a
resource separately from its identity.  Given that this is required,
we felt it most consistent to always use a Selector, but to import the
fragment description semantics into it.  This solves all of the issues
above at the expense of being somewhat more verbose.
 - You can always query oa:hasSource to find the URI of the target
resource, without any segment information
 - You, or a third party, can always attach a State to give the time
for the representation.  The failure to consider the dynamic nature of
web resources has been the downfall of many annotation systems in the
past, and we fully intend to learn from their mistakes.
 - Multiple descriptions are possible, and can work across mime types.
 There is no confusion about what the Specific Resource identifies as
it does not also try to describe it.
 - We can be as expressive as we like using a Selector, and remain
consistent with a single model
 - We can have selectors for new and old media types, without the
blessing of the IANA/IETF registries
 - Selectors are infinitely extensible
 - There is a single model, not two possibilities that everyone would
need to implement both of or risk splitting their user base
 - We import the semantics of the fragment definitions, so are not
re-inventing those.  We simply split the fragment away from the URI of
the full resource to gain the benefits of the above.
 - And finally we tried both ways and the consensus of the group was
that the single selector model was the better approach

After all of that, if you're still not convinced, then as Paolo says
it's only a recommendation to use this approach.  If you feel that
some additional bytes in an already extremely verbose format is too
high a cost for interoperability, expressiveness, consistency and the
understandability of your annotations, then it is not forbidden to
annotate a fragment URI directly. I hope, of course, that you're
convinced otherwise by the arguments above :)

Rob

On Thu, Oct 4, 2012 at 7:12 AM, Paolo Ciccarese
<paolo.ciccarese@gmail.com> wrote:
> Hi Nick,
> the idea is that you can use the fragments URIs but not directly as they
> are.
>
> Given the current structure of the OA model we *recommend* to split source
> and fragment for the reasons that are listed in the specs. In other words,
> if you use a fragment URI directly that might work for you but we wanted to
> make clear that, within this model, that would create problems in using
> other features, querying, sharing and recording additional provenance info.
>
> As for http://www.ietf.org/rfc/rfc2119.txt The adjective "RECOMMENDED", mean
> that there may exist valid reasons in particular circumstances to ignore a
> particular item, but the full implications must be understood and carefully
> weighed before choosing a different course.
>
> Therefore if you have
> http://www.example.com/example.ogv#t=10,20
> we recommend to brake it down:
>
> <SpTarget1> a oa:SpecificResource ;
>     oa:hasSelector <Selector1> ;
>     oa:hasSource <http://www.example.com/example.ogv> .
>
>   <Selector1> a oa:FragmentSelector ;
>     rdf:value "t=10,20" .
>
> And the Fragment URI may be reconstructed by concatenating the oa:hasSource
> resource's URI, plus a '#', plus the value of the Fragment Selector. As OA
> model is a format for exchange, the application consuming the annotation dat
> is supposed to perform the operation when necessary.
>
> Of course, I agree the above set of triples does not look as compact as
> http://www.example.com/example.ogv#t=10,20 is. But in general terms, we know
> that approach causes side effects.
>
> Hope this helps,
> Paolo
>
>
>
>
> On Thu, Oct 4, 2012 at 7:00 AM, Nick White <nick.white@durham.ac.uk> wrote:
>>
>> Hi,
>>
>> I am very interested in the work OpenAnnotation is doing. It looks
>> like it could be very useful indeed.
>>
>> In reading the spec, section 5.2.1 "Fragment Selector"
>> <http://www.openannotation.org/spec/core/#SelectorFragment>, it
>> recommends against using fragment URIs to identify segments.
>>
>> I don't really understand the rationale for this. The language
>> used in the spec is not easy for me to follow. Please could somebody
>> clarify the reasons for me?
>>
>> It seems to me (in my ignorance, no doubt) that standard URI
>> fragment selectors are an obvious and good choice. I was planning to
>> use RFC5147 to refer to sections of text, which is a nice, simple
>> way of doing so. It's basic, but fine for my needs, and being human-
>> readable and easily usable in other contexts has its advantages.
>>
>> Thanks for any guidance, and I look forward to exploring
>> OpenAnnotation more.
>>
>> Nick White
>>
>>
>
>
>
> --
> Dr. Paolo Ciccarese
> http://www.paolociccarese.info/
> Biomedical Informatics Research & Development
> Instructor of Neurology at Harvard Medical School
> Assistant in Neuroscience at Mass General Hospital
> +1-857-366-1524 (mobile)   +1-617-768-8744 (office)
>
> CONFIDENTIALITY NOTICE: This message is intended only for the addressee(s),
> may contain information that is considered
> to be sensitive or confidential and may not be forwarded or disclosed to any
> other party without the permission of the sender.
> If you have received this message in error, please notify the sender
> immediately.
>
>
Received on Thursday, 4 October 2012 15:24:02 UTC