Re: [DPUB][Locators]Cancellation and Next Steps from Ivan Herman on 2016-01-26 (public-digipub-ig@w3.org from January 2016)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 26 Jan 2016 13:31:14 +0100
To: Romain <rdeltour@gmail.com>
Cc: Leonard Rosenthol <lrosenth@adobe.com>, Bill Kasdorf <bkasdorf@apexcovantage.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-Id: <2E917AC8-2E12-46FE-8B42-F3EDDCB3817C@w3.org>
> On 26 Jan 2016, at 13:21, Romain <rdeltour@gmail.com> wrote:
> 
> 
>> We can, and probably should, contact the TAG at some point. But only if and when we have some more solid document in hand describing the use cases and the alternatives. I am not sure we are at that point yet.
> 
> 100% agreed.
> 
> Another source of inspiration is TAG's Packaging on the Web readme [1] which describes several approaches that we've been talking about (they notably ruled out conneg, as well as custom-syntax URLs, but that shouldn't prevent us from considering these).

Hm. It may indeed be a source of inspiration for the problem area itself. I guess [1] is interesting (and not specific to the particular package format the document describes). Pity this document is pretty much in limbo these days...

[1] https://w3ctag.github.io/packaging-on-the-web/#h-example-scenario

> 
> For the rest of the discussions, can we assume that there are –at some point or another–, two distinct URLs: one for the packaged form and one for the unpackaged?
> 

"At some point or another":-) yes. But there is also a format independent URL. It is the interaction among these three that we have to specify…

Ivan


> Romain.
> 
> [1] https://github.com/w3ctag/packaging-on-the-web <https://github.com/w3ctag/packaging-on-the-web>
>> On 26 Jan 2016, at 11:58, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>> 
>>> 
>>> On 26 Jan 2016, at 10:44, Romain <rdeltour@gmail.com <mailto:rdeltour@gmail.com>> wrote:
>>> 
>>> 
>>>> On 26 Jan 2016, at 09:50, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>>> 
>>>> On 25 Jan 2016, at 16:48, Romain <rdeltour@gmail.com <mailto:rdeltour@gmail.com>> wrote:
>>>>> 
>>>>> The problem with conneg is that there are many forms and it's sometimes difficult to understand which one we're talking about :)
>>>>> Here I was assuming you were considering server-driven conneg (aka proactive conneg) where the client sends Accept headers and the server returns the best-match answer. We shouldn't do this.
>>>>> It appears instead that you're considering an agent-driven conneg pattern (aka reactive conneg); but then, this can take many forms.
>>>>> 
>>>> 
>>>> Actually… no. In your description what I was considering is what you call server-driven conneg. Note that the client does not have to send an Accept header; the server can set up a default priority.
>>>> 
>>>> I am not sure why you say 'we shouldn't do this'? What is your argument?
>>> 
>>> There are many reasons, most described in the RFC2616 for HTTP/1.1 [1].
>>> 
>>> Essentially:
>>> - proactive conneg requires clients to set HTTP headers. It may work well for an API with dedicated clients, but browsers won't add PWP-aware headers to all requests they make, so in the general case of browsing the web we have to cater for a reactive conneg pattern anyway.
>>> - proactive conneg is problematic for caches.
>>> - more conceptually, I would argue that the unpacked content served at https://example.org/ThePublication <https://example.org/ThePublication> isn't semantically equivalent (in the context of an HTTP request) to the packaged version at https://example.org/ThePublication?packed <https://example.org/ThePublication?packed>. In the former case, the GET returns an HTML doc with links to subsequent resources, in the latter case, it returns *all* the resources. It's just not the same beast.
>> 
>> 
>> I read those references below, and I do not think it is such a black-and-white issue (well Sivonen's is the only one that may be considered as really black-and-white). Yes, there are possible complication and counter-indications, but not always. E.g., the TAG document that you quote contains a section on 'desktop vs. mobile'[1] which is the closest to our use case, it clearly refers to conneg on the server as a possible way forward (third bullet).
>> 
>> I think the reference to TimBL's document[2] is actually a good one for our further discussion. I quote:
>> 
>>  "It [ie, content negotiation, I.H.] must only used to negotiate between things which, while being of different content types, carry the same information. More or less, in that there may be quality degradation. A jpeg version of photo may have less quality than the PNG alternative. An RDF/XML file may not be able to express all the information in an N3 file, but it might be an important subset."
>> 
>> So indeed, the question really boils down to your last item above and I am not sure I agree… If I use the URL of the Portable Web Document then, whether it returns the manifest file or the full PWP document, it is the same beast, it "carries the same information" to use TimBL's terminology. (Actually, if we put on the implementation goggle via Service Workers, then the invocation of a URL is caught by the Service Worker and the fact whether it unpacks a package or not is actually hidden to the 'higher' layers.)
>> 
>> There are practical difficulties. But I do not think we are at the point where we should rule this approach out. Actually, we may propose several different deployment strategies, maybe they can work in parallel (not sure, though). And we should really look at real, deployment oriented use cases.
>> 
>> [1] https://www.w3.org/2001/tag/doc/alternatives-discovery.html#id2261672 <https://www.w3.org/2001/tag/doc/alternatives-discovery.html#id2261672>
>> [2] https://www.w3.org/DesignIssues/Conneg <https://www.w3.org/DesignIssues/Conneg>
>> 
>> 
>>> 
>>> See also the literature in the following refs:
>>> 
>>> [1] server-driven negtciation in HTTP/1.1 https://tools.ietf.org/html/rfc2616#section-12.1 <https://tools.ietf.org/html/rfc2616#section-12.1>
>>> [2] TBL's design issue on conneg https://www.w3.org/DesignIssues/Conneg <https://www.w3.org/DesignIssues/Conneg> The "When not to use conneg" section applies to my 3d bullet above.
>>> [3] WHATWG's Why Not Conneg (by Henri Sivonen) https://wiki.whatwg.org/wiki/Why_not_conneg <https://wiki.whatwg.org/wiki/Why_not_conneg>
>>> [4] 2006 TAG's finding "On Linking Alternative Representations To Enable Discovery And Publishing" https://www.w3.org/2001/tag/doc/alternatives-discovery.html <https://www.w3.org/2001/tag/doc/alternatives-discovery.html>
>>> [5] Roy Fielding's comments on  ietf-http-wg https://lists.w3.org/Archives/Public/ietf-http-wg/2013JanMar/0811.html <https://lists.w3.org/Archives/Public/ietf-http-wg/2013JanMar/0811.html>
>>> 
>>> 
>>> All that considered, I think that proactive conneg is not the best approach in our case. But I'm not an expert. Maybe we can reach out to TAG or HTTP fellows?
>>> 
>> 
>> We can, and probably should, contact the TAG at some point. But only if and when we have some more solid document in hand describing the use cases and the alternatives. I am not sure we are at that point yet.
>> 
>> Cheers
>> 
>> Ivan
>> 
>> 
>> 
>>> Romain.
>>> 
>>>> 
>>>> Note that this is pretty much what we do when serving namespace documents for RDF vocabularies, for example, in http://www.w3.org/ns/ <http://www.w3.org/ns/> (or elsewhere). The same vocabulary is defined in various formats (Turtle, RDF/XML, JSON-LD, possibly HTML+RDFa), we set up a conneg structure on the server side (using 'var' files), and we even set a priority among the various formats (e.g., pushing the priority for the RDF/XML serialization fairly low compared to Turtle). It is a bit of an administration on the server side, but works well for clients.
>>>> 
>>>>>> 
>>>>>> We could adopt something like that. The important point is that http://ex.org/ThePublication <http://ex.org/ThePublication> is the 'canonical' URL for the publication; it is not the same as the identifier, because it binds to a specific place on a specific server (and may change if I make a copy to myself), but it is, sort of, canonical nevertheless. More importantly, this is the URL that is considered to be the 'base' when considering locators within the document.
>>>>> 
>>>>> Agreed.
>>>>> 
>>>>> However: why couldn't we just say that the URL to the unpacked state is the canonical URL?
>>>>> 
>>>>> The user goes to https://example.org/ThePublication <https://example.org/ThePublication> with a browser, she can read the publication (unpacked) as any other web content. It's basically a static web site.
>>>>> The GET answer to this URL returns HTML content, which in turns has links to the a manifest and an alternative packaged version.
>>>>> 
>>>>> This is reactive conneg, although totally user-driven (machine readable with link rel attributes).
>>>> 
>>>> You are right, that could work, too (although the two conneg-s do not necessarily exclude one another). The only downside is: what happens if the unpacked version is not available, only the packed one?
>>>> 
>>>>> 
>>>>> I think the core issue is about what you said earlier:
>>>>>> I think for a number of potential applications (like, for example, assigning annotations) it is important to have the same URL for the various formats ('states', as we referred to them).
>>>>> 
>>>>> I think we need to better describe these use cases.
>>>> 
>>>> That is clearly the case. We will have to talk to the editor of the PWP Use Case document:-)
>>>> 
>>>>> Is there another use case than annotations? Isn't annotation more affected by the uniqueness of identifiers than by URLs?
>>>>> In any case, I think that the pattern where the packed version is linked from the canonical unpacked version would still work.
>>>>> 
>>>> 
>>>> yes, it definitely would.
>>>> 
>>>>> One aspect that I still don't picture clearly is that we say that (1) having the same URL for both versions is important and at the same time that (2) the URL changes as soon as the publication is effectively ported.
>>>> 
>>>> We have to differentiate, in my view, between the identifier (that does not change if the document is ported) and the locator that identifies a specific instance in two different states. So yes, we do have, maybe, a three-tier situation instead of two. But we agreed that the identifier part is out of our control and charter
>>>> 
>>>> Cheers
>>>> 
>>>> Ivan
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> Romain.
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> I think we agree that whatever is returned, it should give *an access* (in the conceptual sense) to a manifest (and we would not have to go into the syntax of the manifest here).
>>>>>>> 
>>>>>>> Yes, possibly with some level of indirection.
>>>>>> 
>>>>>> Correct
>>>>>> 
>>>>>>> 
>>>>>>>> The return may be
>>>>>>>> 
>>>>>>>> - The full actual data if there is a packaged form;
>>>>>>> 
>>>>>>> OK.
>>>>>>> 
>>>>>>>> the HTTP return header MAY (SHOULD?) also return a link to a manifest
>>>>>>> 
>>>>>>> Are you thinking of using a standard HTTP header or defining a custom one?
>>>>>>> 
>>>>>> 
>>>>>> Standard HTTP. Formally, we can have something like
>>>>>> 
>>>>>> LINK: <http://url.to.the.manifest <http://url.to.the.manifest/>>; rel=<http://identifies.the.manifest.format <http://identifies.the.manifest.format/>>
>>>>>> 
>>>>>> The details must be clarified, but I believe that is a correct approach standard-wise.
>>>>>> 
>>>>>> 
>>>>>>>> but the package MUST contain a manifest in any case (we have to decide which of the manifest have priority).
>>>>>>> 
>>>>>>> If the package format allows a client to easily retrieve the manifest, this is probably enough. Why use an HTTP header in addition?
>>>>>>> I'd suggest that making the manifest easily and deterministically retrievable from the package becomes a requirement.
>>>>>> 
>>>>>> I could imagine that I make a local copy of a publication in a package, which includes the author-provided manifest, but I would like to add some additional information valid for my site only (eg, other redirections) that I 'attach' to the publication through the GET without modifying the package. Ie, both can be useful.
>>>>>> 
>>>>>>> 
>>>>>>>> - The manifest itself, e.g., in JSON format if that is indeed the syntax we adopt, or an HTML file that includes the manifest, or an HTML file that links to the manifest
>>>>>>> 
>>>>>>> yes, for the unpackaged state.
>>>>>> 
>>>>>> Correct.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> Ivan
>>>>>> 
>>>>>>> 
>>>>>>> Romain.
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Ivan
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On the manifest question, I think that the discussion taking place for EPUB about a JSON-based manifest may be useful here as there is definitely overlap in the organization and structure of that material that we would also want here.  And if we could potentially align these two efforts to a single manifest format, then it would make it trivial for implementations to author and provide it (no transcoding required).   But yes, there would need to be more stuff from PWP’s perspective (such as the optional mapping for external resources)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Leonard
>>>>>>>>> 
>>>>>>>>> From: Bill Kasdorf <bkasdorf@apexcovantage.com <mailto:bkasdorf@apexcovantage.com>>
>>>>>>>>> Date: Wednesday, January 20, 2016 at 9:19 AM
>>>>>>>>> To: "public-digipub-ig@w3.org <mailto:public-digipub-ig@w3.org>" <public-digipub-ig@w3.org <mailto:public-digipub-ig@w3.org>>
>>>>>>>>> Subject: [DPUB][Locators]Cancellation and Next Steps
>>>>>>>>> Resent-From: <public-digipub-ig@w3.org <mailto:public-digipub-ig@w3.org>>
>>>>>>>>> Resent-Date: Wednesday, January 20, 2016 at 9:20 AM
>>>>>>>>> 
>>>>>>>>> Hi, folks—
>>>>>>>>> 
>>>>>>>>> Today's Locators Task Force meeting is cancelled, but our Task is not. ;-)
>>>>>>>>> 
>>>>>>>>> It has been suggested by several people that focusing on the actual structure of the locator, and getting a strawman proposal written down, is what we need to do now.
>>>>>>>>> 
>>>>>>>>> There has been some interesting discussion on the list:
>>>>>>>>> 
>>>>>>>>> https://lists.w3.org/Archives/Public/public-digipub-ig/2015Dec/0163.html <https://lists.w3.org/Archives/Public/public-digipub-ig/2015Dec/0163.html> (from Daniel Weck)
>>>>>>>>> https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0095.html <https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0095.html>  (from Ángel González)
>>>>>>>>> 
>>>>>>>>> Ivan suggests that we need to write down:
>>>>>>>>> 
>>>>>>>>> - what should a GET return for a locator (something which is or either refers to a manifest in the abstract sense)
>>>>>>>>> - what should a manifest, conceptually, include. At this moment, I see
>>>>>>>>>                 - an *identifier*
>>>>>>>>>                 - a mapping from absolute URL-s to relative URL-s (where relative means relative to the PWP instance URL)
>>>>>>>>>                 - a mapping from relative URL-s to absolute URL-s
>>>>>>>>> 
>>>>>>>>> Could somebody volunteer to draft a strawman proposal that we can use for the basis of discussion going forward?
>>>>>>>>> 
>>>>>>>>> --Bill
>>>>>>>>> 
>>>>>>>>> Bill Kasdorf
>>>>>>>>> Vice President, Apex Content Solutions
>>>>>>>>> Apex CoVantage
>>>>>>>>> W: +1 734-904-6252
>>>>>>>>> M: +1 734-904-6252
>>>>>>>>> @BillKasdorf <http://twitter.com/#!/BillKasdorf>
>>>>>>>>> bkasdorf@apexcovantage.com <x-msg://17/bkasdorf@apexcovantage.com>
>>>>>>>>> http://isni.org/isni/0000000116490786 <http://isni.org/isni/0000000116490786>
>>>>>>>>> https://orcid.org/0000-0001-7002-4786 <https://orcid.org/0000-0001-7002-4786?lang=en>
>>>>>>>>> www.apexcovantage.com <http://www.apexcovantage.com/>
>>>>>>>>> 
>>>>>>>>> <image001.jpg>
>>>>>>>>> 
>>>>>>>>> <image001.jpg>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ----
>>>>>>>> Ivan Herman, W3C
>>>>>>>> Digital Publishing Lead
>>>>>>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>>>>>>> mobile: +31-641044153
>>>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ----
>>>>>> Ivan Herman, W3C
>>>>>> Digital Publishing Lead
>>>>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>>>>> mobile: +31-641044153
>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>>> 
>>>> ----
>>>> Ivan Herman, W3C
>>>> Digital Publishing Lead
>>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>>> mobile: +31-641044153
>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Lead
>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>


----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Tuesday, 26 January 2016 12:31:30 UTC