Re: Selectors as URIs? from Ivan Herman on 2015-04-14 (public-annotation@w3.org from April 2015)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 14 Apr 2015 11:35:19 +0200
To: Robert Sanderson <azaroth42@gmail.com>
Cc: Paolo Ciccarese <paolo.ciccarese@gmail.com>, W3C Public Annotation List <public-annotation@w3.org>, Bill Kasdorf <bkasdorf@apexcovantage.com>, Tzviya Siegman <tsiegman@wiley.com>, Markus Gylling <markus.gylling@gmail.com>
Message-Id: <CF4B63C7-A905-49BE-9671-401555766D6D@w3.org>
Ok, I understand where you come from related to 3980. And I am o.k. saying that, for example, a #selector approach is valid for, say, HTML5 documents as a first step, maybe extended to other formats that we can define it for. And we define a new version of EPUB, we can also make it part of it.

It is interesting to look at the Web Packaging document

http://w3ctag.github.io/packaging-on-the-web/

which defines a fragment approach: as a first step, you identify a part within the package, and then you apply a fragment on that part; the last step depends on the media type of the part. The new packaging format may well be the basis for a future release of EPUB; what I am looking for is a way to combine the selector model with the overall package fragment mechanism at least for a selected set of media types. I do not really think there should be a problem with that…

Ivan


> On 13 Apr 2015, at 20:03 , Robert Sanderson <azaroth42@gmail.com> wrote:
> 
> 
> 
> On Mon, Apr 13, 2015 at 10:07 AM, Ivan Herman <ivan@w3.org> wrote:
> Hm. The problem is that there is a use case here that we may have to accommodate somehow.
> At the moment, if you take an Ebook, and you want to have a URI identifying a specific position within a specific chapter of a book, what you can use something like:
> 
> http://www.example.org/book#epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:10)
> 
> Sure, because the Epub media type registration defines the meaning of fragment section of URIs where Epub is a representation that can be retrieved.
> We can't legitimately just add #oa:Selector(...) to the end of the URI instead.
> 
> 
> epubcfi works, and is used, but it has its drawbacks (let me not go into all the details). One drawback is that what it offers as anchoring possibility though powerful) is way less flexible than the selector model, primarily the range selectors. The conceptual model behind those would become useful, as an alternative to something like epubcfi, if those structures could be used as fragments.
> 
> Agreed, and the same applies for every other media type as well.
> 
> 
> Maybe we have to restrict its usage and define it only for specific media types (text, etc) to avoid the issues in your example on genetic sequences or full blown graphics. But believe something like that would be very useful and, for some communities, necessary.
> 
> The point from 3986 (and related) is that we _cannot_ define it for specific media types unless we control them.  It's summarized in the first bullet in the annotation spec I linked to.
> For example, the meaning of a fragment on a plain text document is defined by 5147: https://tools.ietf.org/html/rfc5147
> 
> So we can't just say that people should use #oa:Selector(...) with a plain text document (or any other format) :(
> 
> Rob
> 
> 
> 
> (b.t.w., I am not sure I understand your comment on RFC3986)
> [1] http://www.idpf.org/epub/linking/cfi/epub-cfi.html
> 
> > On 13 Apr 2015, at 18:28 , Robert Sanderson <azaroth42@gmail.com> wrote:
> >
> >
> > We discussed fragments in the community group at length.
> >
> > The concerns about the approach are documented here:
> >     http://www.w3.org/TR/annotation-model/#fragment-uris
> >
> > These boil down to the fact that as you get more sophisticated selections the URI becomes unbearably long.
> > Consider serializing an entire SVG document into the URI to specify a non rectangular area. Or selecting the previous and following 1024 Gs Cs As and Ts to select a range of text in a genetic sequence.
> >
> > My personal position is that selectors should not be turned into fragments, because (especially) that would break the rules of fragment identifiers as laid out in RFC 3986:
> >
> > The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource.
> > As further discussed by JeniT here:
> >     http://www.w3.org/TR/fragid-best-practices/
> >
> > Basically, unless there's a new text/HTML RFC that allows us to do it, we can't arbitrarily shove the description of the segment into its identity.
> >
> > Rob
> >
> >
> > On Mon, Apr 13, 2015 at 9:10 AM, Ivan Herman <ivan@w3.org> wrote:
> > (Although this may not be immediately relevant to the Working Group right now, I think the question *may* become relevant, hence my copy to it…)
> >
> > Rob, Paolo,
> >
> > a question came up at the Digital Publishing IG today. The IG is looking at general fragment identifiers for the purpose of identifying portions within a digital document (typically EPUB, but also some future versions of it). The Selector structure of the OA obviously gives a great model for various types of anchors, mainly when combined with other, existing fragment id definitions.
> >
> > However, at present, the selectors are defined in terms of RDF resources; to take an example from the spec, it says, for example
> >
> > selector": {
> >       "@id": "http://example.org/selector1",
> >       "@type": "oa:DataPositionSelector",
> >       "start": 4096,
> >       "end": 4104
> > }
> >
> > To be usable for a fragment identification, this structure should be turned into some sort of a, well, URI fragment. I mean, it is probably relatively easy to do this, something like
> >
> > http://www.example.org/#selector(type=DataPositionSelector,start=4096,end=4104)
> >
> > would do it but, of course, the ideal would be if that type of fragment format would be defined at one place.
> >
> > The question is: has this ever been discussed previously on the OA model? If it hasn't been done, should it be done? If it should be done, should it be done by this WG, or some other group?
> >
> > Thanks
> >
> > Ivan
> >
> >
> > ----
> > Ivan Herman, W3C
> > Digital Publishing Activity Lead
> > Home: http://www.w3.org/People/Ivan/
> > mobile: +31-641044153
> > ORCID ID: http://orcid.org/0000-0003-0782-2704
> >
> >
> >
> >
> >
> >
> >
> > --
> > Rob Sanderson
> > Information Standards Advocate
> > Digital Library Systems and Services
> > Stanford, CA 94305
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
> 
> 
> 
> 
> 
> 
> 
> --
> Rob Sanderson
> Information Standards Advocate
> Digital Library Systems and Services
> Stanford, CA 94305


----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Tuesday, 14 April 2015 09:35:52 UTC