Re: Comments on "Locators for Web Publications" from Baldur Bjarnason on 2017-12-12 (public-publ-wg@w3.org from December 2017)

From: Baldur Bjarnason <baldur@rebus.foundation>
Date: Tue, 12 Dec 2017 16:49:22 -0500
To: "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com>
Cc: Benjamin Young <byoung@bigbluehat.com>, Ivan Herman <ivan@w3.org>, Hadrien Gardeur <hadrien.gardeur@feedbooks.com>, Daniel Glazman <daniel.glazman@disruptive-innovations.com>, W3C Publishing Working Group <public-publ-wg@w3.org>
Message-Id: <0904DEE4-85F9-4F62-BBA8-67D1B7D99880@rebus.foundation>
> On 12 Dec 2017, at 14:48, Siegman, Tzviya - Hoboken <tsiegman@wiley.com> wrote:
> 
> Hi Baldur,
> 
> I think there may be some misunderstanding, and I apologize if I contributed to the misunderstanding.
> 
> We have a very specific charter [1]. The W3C is somewhat strict about sticking to the scope of a charter. If we want to publish additional documents, we need to revise the charter. Editing a spec that was published by a different working group is out of scope. That makes a lot of sense to me. That does not mean that finished specs are sealed and never to be touched again. It does mean that the right group needs to work on it. That's why Benjamin raised [2], to get the discussion going about revisiting some of the work on Web Annotations. 

I’m not asking for new specs. I’m suggesting that this particular draft spec should not go forward beyond a first public working draft. I strongly agree with the idea that Web Annotations work should be done in another WG, one that has more annotation service vendors involved, and you yourself suggested in that GitHub thread that this is something that should be discussed in email, not in that issue.

> I believe that the publ-loc draft does attempt to address multiple resources with a single annotation and annotations spanning multiple resources  (span selector and multi-resource selector) [2.5].

And my feedback is that I think that’s a bad idea that shouldn’t be pursued as it breaks substantially from how things are done in other parts of the web stack. Addressing or targeting multiple resources should be done with multiple links or annotations. I can post an issue to that effect if you want.

> The discussions we had about fragment ids om WP resolved around using the existing fragids for the media types in use in the publications, such as HTML [3].  If  you disagree with this, please comment on the issue, but we did discuss it at length. 

That issue (#27) only mentions fragments _once_ and I’m not even sure whether it meant fragment ids in that context. A fragment identifier is a separate topic from identifier, locator, or address for web publications. Identifying a resource is an entirely different issue from using a fragment identifier to identify a part of that resource. 

My argument is that we only have a remit to define a fragment identifier that identifies a part of the document returned (not other linked resources) when you fetch the web publication (or manifest) URL, provided that the file returned is a new media type. 

If fetching the web publication URL returns an HTML file as has been discussed[1], then defining a fragment identifier for the web publication URL in any way shape or form is outside of our remit. Defining fragment IDs for HTML files is, realistically, only going to be done by the WHATWG. Defining a fragment identifier for web publication URLs is not compatible with the idea of having those URLs return an HTML file.

If fetching the web publication URL returns a manifest file, then defining a fragment identifier for specific parts of that file, _not other linked resources_, would match the behaviour that is the norm for most of the web stack. Making that fragment identifier point at parts of other—possibly multiple—resources breaks with the normal behaviour of fragment identifiers on the web in a fairly major way. You’re overloading URLs and HTTP to behave in ways it definitely wasn’t designed for. I’m not even sure were to start squaring that behaviour with how HTTP normally works in most clients and servers.

And by making a fragment id on a resource we define actually point at a fragment of resources others have defined we’re not only going beyond our remit but potentially disrupting the work of others. You’re essentially monkey-patching the various file formats and media types that belong to other working groups or even standards organisations without any regard for their processes, roadmap, or possible future development.

That's irrespective of the issue of having a fragment id on one resource actually point at a fragment of another resource which is well out of the ordinary for the web even when the two resources are of the same type.

The only issue I could find that’s open and sort of tackles this topic is https://github.com/w3c/publ-loc/issues/6 but it doesn’t seem to make a distinction between fragment IDs for WPs vs PWPs which is  _crucial_ as PWP fragment IDs are a separate issue. A PWP is likely to be a single file that may or may not (we don’t know yet as we don’t have a packaging format) need a fragment ID to point at the items it contains. That issue shouldn’t be subsumed into the WP fragment ID issue. And it’s properly an issue in the pwpub repository, not the locator one.

I can post my concerns in publ-loc#6 if that’s what people want. But these are properly two separate issues and the currently open issue (“Do we need fragment ids?") is too general to be answered productively. 

> We have not yet addressed the question of whether a PWP requires a media type [4], and therefore it is preliminary to discuss fragment ids. This is reflected in issue 27 of publ-loc [5]

It’s not entirely clear to me what the relevance of issue 27 is to the points I’ve been making. In any case, I think there’s a strong argument already made for not pursuing PWP fragment identifiers until we know more about the package format and media type.

> We have a deadline to publish a first public working draft. This is in no way meant to be a final version. On the contrary, it is meant to put our ideas out for review and feedback. We voted in the group to get consensus on whether we were ready for this and achieved overwhelming consensus to move forward. If you see issues that you recommend fixing before we publish, we will certainly consider them, but it is important to be very specific. As we discussed at the meeting on Monday, waiting until everything is perfect or until all contingencies are resolved means we will never solve anything. We very much need the feedback and we want to hear from everyone, so please do offer specific, targeted comments. If there are items that must be resolved before FPWD, please indicate this. If there are items that can be addressed after we publish, feel free to open issues, and we will deal with them in January.  
> 
> Thank you,
> Tzviya

Like I said earlier, I’m discussing the ideas put forward in the first public working draft (or, what’s likely to be substantively similar to the first public working draft), not demanding that the draft not be published. Expecting people to discuss these issues without a published draft is unrealistic, especially since publishing the draft doesn’t commit us to pursuing ideas that later turn out to be impractical but can be an important mechanism for the group to discover which of them are impractical.

I tried to clarify this in an earlier email to Ivan: 

> I don’t think we could have this discussion properly without the document that has been written so not publishing it would indeed be counterproductive. It’s an important part of exploring the problem area. Consider this an early look at the feedback I’m pretty sure you’re going to get from more quarters once the document is properly published. Publishing doesn’t undermine the discussion about how to go forward on this issue as we can always change course before continuing.


AFAICT the point of putting these ideas out for review and feedback is to get reviews and feedback. My feedback is that I don’t think we should continue work on this without fundamental shifts in approach and strategy (like, a complete redo from scratch, preferably shifting most parts of it to other WGs). The current proposal breaks with web norms in a number of fairly major ways which is likely to cause conflict, disruption, and implementation difficulties in the future. 

Like I said. I’m not demanding it be redacted. I am making suggestions for what (not) to do in the future. If you feel that my comments haven’t been specific or targeted enough I can try to make them more specific.

Also, as you may or may not have noticed, I disagreed with the idea that "waiting until everything is perfect or until all contingencies are resolved means we will never solve anything”. Sometimes waiting is the only course of action that will increase the odds of succeeding at your goals and moving too fast is what increases the odds of failure. Moving too fast based on arbitrary deadlines is quite often very counter-productive and I do think this is the case with PWP. I personally don’t see how we can do any real work on PWPs until the manifest format for WPs is done and until we have more concrete proposals on the packaging format. I know that as good as everybody else in the WG disagrees with me but I reserve the right to say that I think it’s a bad idea to move too quickly on PWPs. It’s too dependent on the outcomes of other work.

In any case, it sounds like the time and venue for this particular discussion is in January and in GitHub issues, not on this mailing list. Which is definitely something I can accommodate. My only question is what should I do when the pre-existing issue (if it exists) is much too general to be useful? My instinct would be to open new more specific issues but that’s a good way to get them closed immediately as duplicates if others don’t agree. And there’s a big possibility that my points will be lost and ignored if they are tacked at the end of a lengthy, meandering discussion.

- best
- Baldur Bjarnason
  baldur@rebus.foundation

[1]: https://github.com/w3c/wpub/issues/103

> 
> [1] https://www.w3.org/2017/04/publ-wg-charter/
> [2] https://github.com/w3c/publ-loc/issues/41
> [2.5] https://w3c.github.io/publ-loc/
> [3] https://github.com/w3c/wpub/issues/27
> [4] https://github.com/w3c/pwpub/issues/17
> [5] https://github.com/w3c/publ-loc/issues/27
> 
> Tzviya Siegman
> Information Standards Lead
> Wiley
> 201-748-6884
> tsiegman@wiley.com 
> 
> -----Original Message-----
> From: Baldur Bjarnason [mailto:baldur@rebus.foundation] 
> Sent: Tuesday, December 12, 2017 2:26 PM
> To: Benjamin Young <byoung@bigbluehat.com>
> Cc: Ivan Herman <ivan@w3.org>; Hadrien Gardeur <hadrien.gardeur@feedbooks.com>; Daniel Glazman <daniel.glazman@disruptive-innovations.com>; Siegman, Tzviya - Hoboken <tsiegman@wiley.com>; W3C Publishing Working Group <public-publ-wg@w3.org>
> Subject: Re: Comments on "Locators for Web Publications"
> 
> Hi Benjamin,
> 
> I’ve been hesitant to do that since most of my points are that none of this should be done:
> 
> * Don’t extend web annotations.
> * Don’t try to support multiple resources in a single annotation. 
> * Don’t try to make annotations span multiple resources. 
> * Don’t make a fragment identifier for web publications. 
> * Don’t make a fragment identifier for portable web publications, _yet_. 
> * Don’t continue work on this document.
> 
> I’m not sure those can be hashed out as issues as they relate to this working group’s overarching strategy when it comes to locators and identifiers.
> 
> But if you really do think multiple ‘do not do this’ issues are appropriate, then of course I will post them.
> 
> - best
> - Baldur Bjarnason
>  baldur@rebus.foundation
> 
> 
> 
>> On 12 Dec 2017, at 14:19, Benjamin Young <byoung@bigbluehat.com> wrote:
>> 
>> Baldur there's some interesting stuff hiding in this email. :) Could you post these as issues on publ-loc's GitHub repo?
>> https://github.com/w3c/publ-loc/issues
>> 
>> The actionable points in here do need some action. :)
>> 
>> Thanks, Baldur!
>> Benjamin
>> --
>> http://bigbluehat.com/
>> http://linkedin.com/in/benjaminyoung
>> From: Baldur Bjarnason <baldur@rebus.foundation>
>> Sent: Tuesday, December 12, 2017 1:15:20 PM
>> To: Ivan Herman
>> Cc: Hadrien Gardeur; Daniel Glazman; Tzviya Siegman; W3C Publishing Working Group
>> Subject: Re: Comments on "Locators for Web Publications"
>> 
>> 
>> 
>> Hi Ivan,
>> 
>>> On 12 Dec 2017, at 07:22, Ivan Herman <ivan@w3.org> wrote:
>>> 
>>> The definition of a fragment identifier is only a (very) small part of the current document[1]. The definition itself does _not_ depend on the packaging format, and concentrates only on the case when there is a URI for a collection of document (which, as we agreed, is the essence of a WP, regardless of its packaging details). There is actually an open issue[2] on whether we need a fragment identifier in the first place. We can obviously add, if we think it is necessary, additional fragment id-s to a specific packaging at some point later, but I completely agree that this should be done only when we have more knowledge about the packaging format.
>> 
>> We don’t know yet what the packaging format offers in terms of URLs, locators, identifiers, or interoperability with how those are handled on the web itself.  That question will only be answered once we know what it/they support in terms of storing HTTP requests and responses and how those are exposed to package consumers. We don’t know whether the packaging format or JSON manifest format give us affordances to simplify the fragment identifier structure in order to increase ease of use in content production (which is vital, the current proposal is much to verbose to be usable by content producers). And I also disagree with basing the fragment identifier on the annotation data model.
>> 
>> So IMO it’s much to early to continue serious work on a PWP fragment identifier. Minting a half-baked one first and then a proper one later that’s better aligned with the format is just going to waste resources and cause confusion. Maybe a general purpose fragment identifier—independent of the packaging format—_is_ the best solution, but we won’t actually know that until we have the packaging formats (or more concrete proposals) in front of us. We can’t even properly answer the question yet as to whether it’s necessary to create a fragment identifier for PWPs until we have more concrete details on PWPs.
>> 
>>> 
>>>> 
>>>>      • We should build a compelling case, including use cases, case studies, testimonials from companies, etc. for the WHATWG to support the proposal that fragment identifiers for arbitrary ranges in an HTML resource is a vital addition to the web if it is to accommodate publications.
>>> 
>>> Yes, if we concentrate on one specific HTML resource, it would be good to have something like that. And there is no disagreement on the fact that this work should not be done in this WG; it is not even part of the discussion. Whether this should be done in the WHATWG or the Web Platform WG is an issue that is also not for us to decide.
>>> 
>>> What we _do_ have, but only in the sense of having it inherited from the Web Annotation spec, are Selectors defined by that spec that do define ranges in an HTML resource in terms of JSON(-LD) properties.
>>> 
>>> (Personally, I have my doubts whether this Working Group is in position to do that type of lobbying and initial implementations to get it through the WHATWG/WPWG. But that is another story.)
>> 
>> If we can’t get through to them, then it can’t be done. They are the platform. Without browser buy-in, then at best what we can do is add ancillary features for specialised UAs. Building up resources, tactics, and strategies for organised lobbying of the WHATWG is essential, long term, for the various publishing industries if they want their needs to be accommodated in the browser space. Informal discussions are unlikely to work. (I know this an opinion many in the WG disagree with and many who might even agree in principle consider it impossible. It’s still the right thing to do.)
>> 
>> Building a fragment identifier for Web Publications whose primary goal is to enhance linking to HTML resources is IMO absolutely entering the territory of the WHATWG. An annotations data model is one thing, once you turn that into a fragment identifier you are relying on technicalities to protect you from accusations of overreach and conflict.
>> 
>>> 
>>>> 
>>>>      • We should concentrate on providing an extremely small set of general purpose JSON-LD properties so that our work on locators isn’t limited to just Web Annotations but can be reused in other formats such as schema.org or Activity Streams.
>>> 
>>> This is exactly what the Locator document does (modulo possible name change). It defines three different Selectors, that are JSON objects with some properties. And it also defines the Positions that ensure a different structure, also in terms of JSON objects with some properties. They may not be the right ones, that is of course a matter of discussion. There is also an open issue on whether the 'Position' structure is necessary or not[3]; we may decide to drop that part an concentrate on a few (namely three) extra selectors only.
>>> 
>>> (Our document does not refer to JSON-LD but only JSON. The annotation spec is, in fact, JSON-LD, and the JSON we added follow the same structures, ie, it could also be considered JSON-LD. However, if we want to take that aspect seriously, we would also have to extend the formal RDFS Vocabulary for annotations[4]. This can be done, maybe should be done, but only later.)
>>> 
>>> Bottom line: I am not sure whether, and if yes where, we really disagree…
>> 
>> We have a fundamental disagreement in that I disagree with the notion that spanning or including multiple resources in a single link structure is a good idea. Multiple resource targets requiring multiple links is how the web is structured currently. It creates too much of a conceptual mismatch with existing parts of the web stack. _Especially, in a fragment identifier._
>> 
>> I also disagree with us extending or iterating on web annotations. If those specifications require more work then that should be done by those who are actually running annotation services in production and based on their needs. My suggestion was to create a small set (like, one or two) of properties that are general purpose JSON-LD properties for indicating membership in a web publication, not to extend the annotations data model. That might even be as simple as giving the rel values we mint alternate full URLs to make it easier to integrate them into JSON-LD contexts. And that’s mostly to make adding publication-specific support to other formats more convenient—not a hard requirement by any means. Activity Streams, for example, support link objects with rel values. It’s a bit verbose but definitely doable without any extensions. Adding affordances for JSON-LD contexts would be a nice-to-have that we could work on at a later date once the foundation is ready (or it could be a side effect of the manifest work if that ends up in JSON-LD).
>> 
>> I disagree with the notion that we are in any way ready to even know whether a fragment identifier for PWPs is needed, let alone what it should look like. Further work on that should be postponed.
>> 
>> And finally, I think fragment identifiers for (non-portable) Web Publications exist only to bypass other standards organisations and working groups. That is a pretty substantial disagreement that I don’t see us resolving any time soon.
>> 
>> By all means, people should use the web annotations data model. I just don’t think we’re the right venue to extend it, and if we do, we are likely to get it wrong unless we get annotation service vendors involved in a serious way. And we definitely shouldn’t turn it into a fragment identifier.
>> 
>> So, we actually disagree quite substantively, from what I can see.
>> 
>> - best
>> - Baldur Bjarnason
>>  baldur@rebus.foundation
>> 
>>> 
>>> Ivan
>>> 
>>> 
>>> [1] https://w3c.github.io/publ-loc/#ErsFragmentId_def
>>> [2] https://github.com/w3c/publ-loc/issues/6
>>> [3] https://github.com/w3c/publ-loc/issues/29
>>> [4] https://www.w3.org/TR/annotation-vocab/
>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Publishing@W3C Technical Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>> 
>
Received on Tuesday, 12 December 2017 21:49:53 UTC