Re: Comments on "Locators for Web Publications" from Baldur Bjarnason on 2017-12-12 (public-publ-wg@w3.org from December 2017)

From: Baldur Bjarnason <baldur@rebus.foundation>
Date: Tue, 12 Dec 2017 10:58:56 -0500
To: Ivan Herman <ivan@w3.org>
Cc: Tzviya Siegman <tsiegman@wiley.com>, W3C Publishing Working Group <public-publ-wg@w3.org>
Message-Id: <4B2FF985-9F19-40DB-BD51-2A722E063131@rebus.foundation>
Ivan,

Part of the discussion that follows. I don’t think we could have this discussion properly without the document that has been written so not publishing it would indeed be counterproductive. It’s an important part of exploring the problem area. Consider this an early look at the feedback I’m pretty sure you’re going to get from more quarters once the document is properly published. Publishing doesn’t undermine the discussion about how to go forward on this issue as we can always change course before continuing.

- best
- Baldur Bjarnason
  baldur@rebus.foundation



> On 12 Dec 2017, at 03:15, Ivan Herman <ivan@w3.org> wrote:
> 
> Baldur,
> 
> I have a fully administrative question at this point (that is my role…)
> 
> We have voted for the FPWD publication of the current document (modulo title change, and it is current, ie, crisper format). Does this mean that you object to this decision, or is this part of the discussion that we will have to follow around the document? I have to send in the transition/publication request next week and go on with the process, so this should be decided.
> 
> (Note that, once the milestone of FPWD is passed, it will become way easier to publish new 'official' versions, we will not have to go through some extra administration as with the FPWD. This means that, if we want, we can 'publish' every week or every other week, essentially any time we feel there is a somewhat stable version out there.)
> 
> Thanks
> 
> Ivan
> 
> P.S. B.t.w., the locator document, in its current or previous versions, have been on the repo and have been the subject of WG deliberations for several month now. It is the process of publishing that has sparked some discussions, which shows, in my view, that having a somewhat more stable publication has its inherent value...
> 
> 
> 
>> On 12 Dec 2017, at 02:34, Baldur Bjarnason <baldur@rebus.foundation> wrote:
>> 
>> 
>> Hi all,
>> 
>> I haven’t had a chance to give the locator spec a proper critical reading (as in, with line-by-line comments). I’m not sure I will ever have the time to do so, given that I’m only supposed to spare a couple of hours a week on W3C matters.
>> 
>> And, as I’ve mentioned before, I’ve been avoiding reading the locator document because I found it likely that I wouldn’t be a huge fan and my prior experience with criticising EPUBCFIs was pretty darn unpleasant.
>> 
>> (I also have anxiety issues which can get triggered by a lot of seemingly innocuous things. This has turned out to be one of them.)
>> 
>> While the work in the document has been important insofar as we needed this draft to become properly aware of the issues, it is my opinion that we should not continue further work on this specific document but instead build on what we have learned from publishing it.
>> 
>> My suggestions:
>> 
>>  • We should postpone further work on PWP fragment identifiers until we know its packaging format and structure.
>> 
>>  • We should build a compelling case, including use cases, case studies, testimonials from companies, etc. for the WHATWG to support the proposal that fragment identifiers for arbitrary ranges in an HTML resource is a vital addition to the web if it is to accommodate publications.
>> 
>>  • We should concentrate on providing an extremely small set of general purpose JSON-LD properties so that our work on locators isn’t limited to just Web Annotations but can be reused in other formats such as schema.org or Activity Streams.
>> 
>> I’ve outlined the specifics of my issues below.
>> 
>> ## My issues
>> 
>> The core of the locator document is that it is an extension of the Web Annotations Data Model providing early versions of solutions to two problems:
>> 
>>  1. Fragment identifiers for WPs and PWPs.
>> 
>>  2. The actual extension for the web annotations data model to handle multiple resources.
>> 
>> I disagree with the approach to both issues in the locator document. While the document has been useful in helping us highlight these issues I don’t think the approaches it describes properly reflect how our other work (WPs and PWPs) has been shaping up and where the rest of the web stack is heading.
>> 
>> 
>> ### 1. Fragment identifiers
>> 
>>  • Defining fragment identifiers to solve a problem that doesn’t exist yet for a format that hasn’t been defined yet seems premature. (As in, the only documented use case for this problem I could find is CFI, as an internal locator format for EPUB UAs and it really isn’t up to us to standardise internal UA formats.)
>> 
>>  • Defining a fragment identifier as a serialisation of a rather verbose JSON format not only seems inefficient but will be error-prone in practice.
>> 
>>  • As WP resources all, by definition, have to exist on the web, the main purpose for a WP fragment identifier is to bypass the WHATWG unwillingness to extend HTML with fragment identifiers for arbitrary locations/ranges (beyond the multiple resource issue which I address below and don’t find compelling). It’s a tactic that will almost certainly be fragile in practice and is unlikely to be widely adopted outside of specialised apps. My feeling is that we should be stating and restating the case for publishing-specific needs to other standards groups, and not actively bypassing them by overloading our own formats.
>> 
>>  • While a PWP fragment identifier would be useful, the extent of its usefulness and what an optimal design would look like is really unclear at this point. A PWP file that is web-addressable should properly be linked to using the canonical URLs of the WP and resources it contains. Especially if we end up using the Web Packaging format as that format is specifically designed for distributing resources that already exist on the web.
>> 
>>  • Fragment ID linking for a PWP that isn’t web addressable (e.g. on a filesystem or in a UAs internal storage system) by definition has to be UA-specific as we have no reliable method of knowing the protocol or methods to access that PWP or how the rest of the identifier (the non-fragment part) is supposed to be dereferenced. This isn’t to say a fragment identifier wouldn’t be useful but that it’s a very hairy issue that can’t be properly solved until we have more experience in the wild with actual PWP User Agents.
>> 
>> So, on the fragment identifier issue, my suggestion is that once we decide on a name for this public working draft, we should postpone further work and instead revisit the issue from scratch once we actually know what the PWP formats look like. WP fragment identifiers are a much trickier issue as those, in my personal opinion, absolutely cross into territory that belongs to other working groups and even other standards organisations (WHATWG).
>> 
>> 
>> ### 2. Multiple resources
>> 
>> The document does not provide a compelling argument as to why the Web Annotation Data Model is insufficient for our purposes or why such a heavy-handed extension is required.
>> 
>>  • Web Annotations collections are ordered by default so annotations spanning multiple resources can, without loss, be stored as a series of annotations rather than one multi-resource one.
>> 
>>  • There’s also an argument to be made that an annotation or links spanning two resources breaks substantially from web norms and that storing them as separate, sequential annotations aligns more closely with other parts of the web stack. (E.g. there is no simple, commonly accepted way to map the concept of a single locator spanning multiple resources onto HTTP.)
>> 
>>  • It would be much better for interoperability if web annotations reused the mechanisms for publication membership we are planning to provide in any case: rel=publication from the resource to the publication URL. There is precedence for The Web Annotations Data Model for reusing rel values as properties[1] so a much less complex and more interoperable solution would be for us to re-use rel=publication as {publication: ’url for publication’} on an individual annotation level to associate the annotation with the publication and then rely on the sequence for resource spanning.
>> 
>>  • By defining the ‘publication’ rel value as a property in the JSON-LD space as a mechanism for stating ‘this resource is a part of that publication’ we also open up interoperability with a much wider set of JSON-LD formats such as Activity Streams whereas the current draft is extremely specific to the Web Annotation Data Model. This is also useful in a broader sense as it would give JSON-LD resources an official way to indicate publication membership that didn’t rely on an HTTP Link header.
>> 
>> 
>> ### The name
>> 
>> I know we are still discussing this on Github, but based on how many resources EPUBCFI sucked up from content producers who, fruitlessly, kept trying to use it in content production, I find the use of ‘locator’ in the document’s title highly problematic.
>> 
>> (I’ve already made a comment about this on the relevant issue in github and I’m not going to rehash the argument from there.)
>> 
>> 
>> ## Finally
>> 
>> Like EPUBCFIs, the mechanism proposed in this document does not adequately solve the problem of content linking from one resource to a specific location in another publication. And locators spanning multiple resources is a problem I don’t think we should solve as it doesn’t map easily to any other part of the web stack. Using a fragment identifier on the web publication URL to de facto bypass the WHATWG all but guarantees that this will be a solution that’s specific to specialised publication User Agents, possibly even exclusive to EPUB4 in the long term.
>> 
>> There is no way around the fact that to adequately solve this problem, we need to convince browser vendors that linking to an arbitrary location in an HTML resource is an important feature of the web. The web design community went through hell to get CSS Grids and <picture> implemented. If they had just pushed those two concepts through other venues of their own, or if they had given up advocating for these ideas, we never would have had these issues solved for the web at large.
>> 
>> I know most people in this WG will disagree with me but this is a clear case of a mechanism that should be specified by the WHATWG and outside of our remit. Our job should be an organised campaign to convince them.
>> 
>> We need to build up the habit, resources, and skill set to evangelise publication needs among browser vendors as this is not the only use case that is only adequately solvable by them.
>> 
>> So to reiterate my suggestions.
>> 
>> The best way, in my opinion, to build upon what we’ve learned from this document is to:
>> 
>>  • Postpone further work on PWP fragment identifiers until we know its packaging format and structure.
>>  • Build a compelling case, including use cases, case studies, testimonials from companies, etc. for the WHATWG that that demonstrate that fragment identifiers to arbitrary ranges in an HTML resource are a vital addition to the web if it is to accommodate publication use cases.
>>  • Concentrate on providing an extremely small set of general purpose JSON-LD properties so that our work on locators isn’t limited to just Web Annotations but can be reused in other formats such as schema.org or Activity Streams.
>> 
>> [1]: https://www.w3.org/TR/2017/REC-annotation-model-20170223/#other-identities
>> 
>> - best
>> - Baldur Bjarnason
>> baldur@rebus.foundation
>> 
>> 
>> 
>>> On 11 Dec 2017, at 12:53, Hadrien Gardeur <hadrien.gardeur@feedbooks.com> wrote:
>>> 
>>>> If you keep on that path, the WG will *necessarily*, at some point in the future, need a fragment identifier dedicated to EPUB or WP in URLs, through a new RFC. My take is that we should start from there, through an analysis of requirements, prioritization of these requirements and focus on the two or three most important things
>>> 
>>> I agree with that statement and there are a few things we could start with:
>>>  • how do we reference text fragments in general and HTML in particular? (this is not WP/PWP/EPUB specific)
>>>  • how do we reference an HTML document and the WP that it belongs to in a single URL?
>>>  • what about PWPs that are strictly published offline?
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C
> Publishing@W3C Technical Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
>
Received on Tuesday, 12 December 2017 15:59:29 UTC