Re: Selectors as URIs? from Ivan Herman on 2015-04-19 (public-annotation@w3.org from April 2015)

From: Ivan Herman <ivan@w3.org>
Date: Sun, 19 Apr 2015 09:58:24 +0200
To: Liam Quin <liam@w3.org>
Cc: Randall Leeds <randall@bleeds.info>, Robert Sanderson <azaroth42@gmail.com>, Paolo Ciccarese <paolo.ciccarese@gmail.com>, W3C Public Annotation List <public-annotation@w3.org>, Bill Kasdorf <bkasdorf@apexcovantage.com>, Tzviya Siegman <tsiegman@wiley.com>, Markus Gylling <markus.gylling@gmail.com>
Message-Id: <368BB21B-A303-493C-8F00-D05162A4C5E3@w3.org>
> On 18 Apr 2015, at 20:34 , Liam R. E. Quin <liam@w3.org> wrote:
> 
> On Tue, 2015-04-14 at 11:38 +0200, Ivan Herman wrote:
>>> On 13 Apr 2015, at 23:05 , Randall Leeds <randall@bleeds.info>
>>> wrote:
>>> 
>>> Rob's answer is much better than mine, but points to the same
>>> solution, I think.
>>> 
>>> If you control the media type and the meaning of fragments in that
>>> context, then you don't need our permission to put an OA selector
>>> in the fragment.
>> 
>> I am not looking for a permission, I am looking for a coordination:-)
> 
> 
> URI fragments are defined by media type registrations based on media
> type. For example, the fragment identifier syntax for XML documents is
> defined to be XPointer.
> 
> This is arguably bad architecture, because it fails on content
> negotiation. So when someone asked on the dpub call I probably should
> have said "no, you can't use a URI for this".
> 
> You can, however, invert it:
> annotations://annotation-server.example.org/?uri=yyy;xpath=/book/chapter[3]//figure[@src='me.jpg']/ancestor::para/wordspan(5,17);mode-
> highlight
> 
> The annotation server could issue a redirect -- but a client-side
> engine could simply rewrite this to an annotated URI (yyy).
> 
> So I think there's scope for creativity.

The discussion, if fact, originated from outside the annotation work; it is back to the web packaging[1] work that may become the new packaging format for future electronic document like ebooks. If the web packaging is adopted. The fragment identification in that document is a bit similar to what you describe, it would have something like

http://example.org/downloads/editor.pack#url=/root.html;fragment=something_complex_here

where the 'url' part of the fragment identifies a 'part' within a package (there are other parameters to help in filtering among various parts) and the value of the 'fragment' is, well, a fragment identifier *within* the identified part, according to the media type of that header. So the remark of Rob on the thread is justified: if we turn a selector into a fragment, it should be accepted/acceptable for a specific media type (which I believe is not an unsurmountable obstacle).

Ivan


[1] http://www.w3.org/TR/web-packaging/
[2] http://www.w3.org/TR/web-packaging/#fragment-identifiers


> 
> 
>> Sure, we can define those URI fragments. Is it o.k. if a group just
>> does that without coordinating with those who are behind the OA
>> Selector model? I do not think so…
>> 
>> Actually, I also wonder whether the serialization of the selectors
>> in terms of URI-s would not come handy to our own deliverables, too
>> (again, with possible restrictions as for the media types). Eg,
>> handling URI-s that way with existing URI libraries in the various
>> programming languages around us might be handy… (But, I admit, I am
>> just handwaving here…)
>> 
>> Ivan
>> 
>> 
>>> 
>>> 
>>> On Mon, Apr 13, 2015, 11:04 Robert Sanderson <azaroth42@gmail.com>
>>> wrote:
>>> On Mon, Apr 13, 2015 at 10:07 AM, Ivan Herman <ivan@w3.org> wrote:
>>> Hm. The problem is that there is a use case here that we may have
>>> to accommodate somehow.
>>> At the moment, if you take an Ebook, and you want to have a URI
>>> identifying a specific position within a specific chapter of a
>>> book, what you can use something like:
>>> 
>>> http://www.example.org/book#epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:10)
>>> 
>>> 
>>> Sure, because the Epub media type registration defines the meaning
>>> of fragment section of URIs where Epub is a representation that
>>> can be retrieved.
>>> We can't legitimately just add #oa:Selector(...) to the end of the
>>> URI instead.
>>> 
>>> 
>>> epubcfi works, and is used, but it has its drawbacks (let me not
>>> go into all the details). One drawback is that what it offers as
>>> anchoring possibility though powerful) is way less flexible than
>>> the selector model, primarily the range selectors. The conceptual
>>> model behind those would become useful, as an alternative to
>>> something like epubcfi, if those structures could be used as
>>> fragments.
>>> 
>>> Agreed, and the same applies for every other media type as well.
>>> 
>>> 
>>> Maybe we have to restrict its usage and define it only for
>>> specific media types (text, etc) to avoid the issues in your
>>> example on genetic sequences or full blown graphics. But believe
>>> something like that would be very useful and, for some
>>> communities, necessary.
>>> 
>>> The point from 3986 (and related) is that we _cannot_ define it
>>> for specific media types unless we control them.  It's summarized
>>> in the first bullet in the annotation spec I linked to.
>>> For example, the meaning of a fragment on a plain text document is
>>> defined by 5147: https://tools.ietf.org/html/rfc5147
>>> 
>>> So we can't just say that people should use #oa:Selector(...) with
>>> a plain text document (or any other format) :(
>>> 
>>> Rob
>>> 
>>> 
>>> 
>>> (b.t.w., I am not sure I understand your comment on RFC3986)
>>> [1] http://www.idpf.org/epub/linking/cfi/epub-cfi.html
>>> 
>>>> On 13 Apr 2015, at 18:28 , Robert Sanderson <azaroth42@gmail.com
>>>>> wrote:
>>>> 
>>>> 
>>>> We discussed fragments in the community group at length.
>>>> 
>>>> The concerns about the approach are documented here:
>>>>    http://www.w3.org/TR/annotation-model/#fragment-uris
>>>> 
>>>> These boil down to the fact that as you get more sophisticated
>>>> selections the URI becomes unbearably long.
>>>> Consider serializing an entire SVG document into the URI to
>>>> specify a non rectangular area. Or selecting the previous and
>>>> following 1024 Gs Cs As and Ts to select a range of text in a
>>>> genetic sequence.
>>>> 
>>>> My personal position is that selectors should not be turned into
>>>> fragments, because (especially) that would break the rules of
>>>> fragment identifiers as laid out in RFC 3986:
>>>> 
>>>> The semantics of a fragment identifier are defined by the set of
>>>> representations that might result from a retrieval action on the
>>>> primary resource.
>>>> As further discussed by JeniT here:
>>>>    http://www.w3.org/TR/fragid-best-practices/
>>>> 
>>>> Basically, unless there's a new text/HTML RFC that allows us to
>>>> do it, we can't arbitrarily shove the description of the segment
>>>> into its identity.
>>>> 
>>>> Rob
>>>> 
>>>> 
>>>> On Mon, Apr 13, 2015 at 9:10 AM, Ivan Herman <ivan@w3.org>
>>>> wrote: (Although this may not be immediately relevant to the
>>>> Working Group right now, I think the question *may* become
>>>> relevant, hence my copy to it…)
>>>> 
>>>> Rob, Paolo,
>>>> 
>>>> a question came up at the Digital Publishing IG today. The IG is
>>>> looking at general fragment identifiers for the purpose of
>>>> identifying portions within a digital document (typically EPUB,
>>>> but also some future versions of it). The Selector structure of
>>>> the OA obviously gives a great model for various types of
>>>> anchors, mainly when combined with other, existing fragment id
>>>> definitions.
>>>> 
>>>> However, at present, the selectors are defined in terms of RDF
>>>> resources; to take an example from the spec, it says, for example
>>>> 
>>>> selector": {
>>>>      "@id": "http://example.org/selector1",
>>>>      "@type": "oa:DataPositionSelector",
>>>>      "start": 4096,
>>>>      "end": 4104
>>>> }
>>>> 
>>>> To be usable for a fragment identification, this structure
>>>> should be turned into some sort of a, well, URI fragment. I
>>>> mean, it is probably relatively easy to do this, something like
>>>> 
>>>> http://www.example.org/#selector(type=DataPositionSelector,start=4096,end=4104)
>>>> 
>>>> 
>>>> would do it but, of course, the ideal would be if that type of
>>>> fragment format would be defined at one place.
>>>> 
>>>> The question is: has this ever been discussed previously on the
>>>> OA model? If it hasn't been done, should it be done? If it
>>>> should be done, should it be done by this WG, or some other
>>>> group?
>>>> 
>>>> Thanks
>>>> 
>>>> Ivan
>>>> 
>>>> 
>>>> ----
>>>> Ivan Herman, W3C
>>>> Digital Publishing Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Rob Sanderson
>>>> Information Standards Advocate
>>>> Digital Library Systems and Services
>>>> Stanford, CA 94305
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Rob Sanderson
>>> Information Standards Advocate
>>> Digital Library Systems and Services
>>> Stanford, CA 94305
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>> 
>> 
>> 
>> 
> 


----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Sunday, 19 April 2015 07:58:42 UTC