Re: [DPUB][Locators]Cancellation and Next Steps from Ivan Herman on 2016-01-26 (public-digipub-ig@w3.org from January 2016)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 26 Jan 2016 09:50:33 +0100
To: Romain <rdeltour@gmail.com>
Cc: Leonard Rosenthol <lrosenth@adobe.com>, Bill Kasdorf <bkasdorf@apexcovantage.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-Id: <D5CF7E68-7D6E-460D-A33E-955C8F3E7A0A@w3.org>
Hi Romain,

> On 25 Jan 2016, at 16:48, Romain <rdeltour@gmail.com> wrote:
> 
>> 
>> On 25 Jan 2016, at 11:32, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>> 
>>> 
>>> On 21 Jan 2016, at 18:10, Romain <rdeltour@gmail.com <mailto:rdeltour@gmail.com>> wrote:
>>> 
>>> 
>>>> On 21 Jan 2016, at 07:28, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>>> 
>>>> I think that is where HTTP content negotiation should come in to the picture in my view.
>>>> (snip)
>>>> The only thing this mechanism requires is to have a distinct media type assigned to a PWP (akin to the media type for EPUB) and, probably, to the manifest. If that is there, a client may express the media types it accepts, and even the relative priority of what format it prefers (if the client has several)
>>>> 
>>>> This allows for a setup where there is *no* packaged form around. And content negotiation is a mechanism implemented by all servers and clients these days, so we should just use it…
>>> 
>>> I'm not convinced by the content negotiation approach. The format to get (expanded vs. packaged) may depend on the user's intent (e.g. whether she wants to start reading the book or download it to share on a USB key); the decision is not necessarily under the responsibility of client software.
>>> I believe it would be easier if the two formats were represented by two different URLs, which has the benefit of working in run-of-the-mill browsers.
>> 
>> I think for a number of potential applications (like, for example, assigning annotations) it is important to have the same URL for the various formats ('states', as we referred to them).
>> 
>> The 'usual' approach taken by content negotiations is something like:
>> 
>> - http://ex.org/ThePublication <http://ex.org/ThePublication>  - is the URL of the resources
>> - http://ex.org/ThePublication.pack <http://ex.org/ThePublication.pack> - is the URl of the packaged version (if any)
>> - http://ex.org/ThePublication.unpack <http://ex.org/ThePublication.unpack> - is the URI of the unpacked version (if any)
>> 
>> The client uses the first URL with some preferences to get to either the packaged file or the directly to the document on the Web. Explicit addressing of the, say, package is there if one wants to copy the file on a USB stick.
> 
> The problem with conneg is that there are many forms and it's sometimes difficult to understand which one we're talking about :)
> Here I was assuming you were considering server-driven conneg (aka proactive conneg) where the client sends Accept headers and the server returns the best-match answer. We shouldn't do this.
> It appears instead that you're considering an agent-driven conneg pattern (aka reactive conneg); but then, this can take many forms.
> 

Actually… no. In your description what I was considering is what you call server-driven conneg. Note that the client does not have to send an Accept header; the server can set up a default priority.

I am not sure why you say 'we shouldn't do this'? What is your argument?

Note that this is pretty much what we do when serving namespace documents for RDF vocabularies, for example, in http://www.w3.org/ns/ <http://www.w3.org/ns/> (or elsewhere). The same vocabulary is defined in various formats (Turtle, RDF/XML, JSON-LD, possibly HTML+RDFa), we set up a conneg structure on the server side (using 'var' files), and we even set a priority among the various formats (e.g., pushing the priority for the RDF/XML serialization fairly low compared to Turtle). It is a bit of an administration on the server side, but works well for clients.

>> 
>> We could adopt something like that. The important point is that http://ex.org/ThePublication <http://ex.org/ThePublication> is the 'canonical' URL for the publication; it is not the same as the identifier, because it binds to a specific place on a specific server (and may change if I make a copy to myself), but it is, sort of, canonical nevertheless. More importantly, this is the URL that is considered to be the 'base' when considering locators within the document.
> 
> Agreed.
> 
> However: why couldn't we just say that the URL to the unpacked state is the canonical URL?
> 
> The user goes to https://example.org/ThePublication <https://example.org/ThePublication> with a browser, she can read the publication (unpacked) as any other web content. It's basically a static web site.
> The GET answer to this URL returns HTML content, which in turns has links to the a manifest and an alternative packaged version.
> 
> This is reactive conneg, although totally user-driven (machine readable with link rel attributes).

You are right, that could work, too (although the two conneg-s do not necessarily exclude one another). The only downside is: what happens if the unpacked version is not available, only the packed one?

> 
> I think the core issue is about what you said earlier:
>> I think for a number of potential applications (like, for example, assigning annotations) it is important to have the same URL for the various formats ('states', as we referred to them).
> 
> I think we need to better describe these use cases.

That is clearly the case. We will have to talk to the editor of the PWP Use Case document:-)

> Is there another use case than annotations? Isn't annotation more affected by the uniqueness of identifiers than by URLs?
> In any case, I think that the pattern where the packed version is linked from the canonical unpacked version would still work.
> 

yes, it definitely would.

> One aspect that I still don't picture clearly is that we say that (1) having the same URL for both versions is important and at the same time that (2) the URL changes as soon as the publication is effectively ported.

We have to differentiate, in my view, between the identifier (that does not change if the document is ported) and the locator that identifies a specific instance in two different states. So yes, we do have, maybe, a three-tier situation instead of two. But we agreed that the identifier part is out of our control and charter

Cheers

Ivan


> 
> 
> Romain.
> 
>> 
>>> 
>>>> I think we agree that whatever is returned, it should give *an access* (in the conceptual sense) to a manifest (and we would not have to go into the syntax of the manifest here).
>>> 
>>> Yes, possibly with some level of indirection.
>> 
>> Correct
>> 
>>> 
>>>> The return may be
>>>> 
>>>> - The full actual data if there is a packaged form;
>>> 
>>> OK.
>>> 
>>>> the HTTP return header MAY (SHOULD?) also return a link to a manifest
>>> 
>>> Are you thinking of using a standard HTTP header or defining a custom one?
>>> 
>> 
>> Standard HTTP. Formally, we can have something like
>> 
>> LINK: <http://url.to.the.manifest <http://url.to.the.manifest/>>; rel=<http://identifies.the.manifest.format <http://identifies.the.manifest.format/>>
>> 
>> The details must be clarified, but I believe that is a correct approach standard-wise.
>> 
>> 
>>>> but the package MUST contain a manifest in any case (we have to decide which of the manifest have priority).
>>> 
>>> If the package format allows a client to easily retrieve the manifest, this is probably enough. Why use an HTTP header in addition?
>>> I'd suggest that making the manifest easily and deterministically retrievable from the package becomes a requirement.
>> 
>> I could imagine that I make a local copy of a publication in a package, which includes the author-provided manifest, but I would like to add some additional information valid for my site only (eg, other redirections) that I 'attach' to the publication through the GET without modifying the package. Ie, both can be useful.
>> 
>>> 
>>>> - The manifest itself, e.g., in JSON format if that is indeed the syntax we adopt, or an HTML file that includes the manifest, or an HTML file that links to the manifest
>>> 
>>> yes, for the unpackaged state.
>> 
>> Correct.
>> 
>> Thanks
>> 
>> Ivan
>> 
>>> 
>>> Romain.
>>> 
>>> 
>>>> 
>>>> Ivan
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> On the manifest question, I think that the discussion taking place for EPUB about a JSON-based manifest may be useful here as there is definitely overlap in the organization and structure of that material that we would also want here.  And if we could potentially align these two efforts to a single manifest format, then it would make it trivial for implementations to author and provide it (no transcoding required).   But yes, there would need to be more stuff from PWP’s perspective (such as the optional mapping for external resources)
>>>>> 
>>>>> 
>>>>> Leonard
>>>>> 
>>>>> From: Bill Kasdorf <bkasdorf@apexcovantage.com <mailto:bkasdorf@apexcovantage.com>>
>>>>> Date: Wednesday, January 20, 2016 at 9:19 AM
>>>>> To: "public-digipub-ig@w3.org <mailto:public-digipub-ig@w3.org>" <public-digipub-ig@w3.org <mailto:public-digipub-ig@w3.org>>
>>>>> Subject: [DPUB][Locators]Cancellation and Next Steps
>>>>> Resent-From: <public-digipub-ig@w3.org <mailto:public-digipub-ig@w3.org>>
>>>>> Resent-Date: Wednesday, January 20, 2016 at 9:20 AM
>>>>> 
>>>>> Hi, folks—
>>>>> 
>>>>> Today's Locators Task Force meeting is cancelled, but our Task is not. ;-)
>>>>> 
>>>>> It has been suggested by several people that focusing on the actual structure of the locator, and getting a strawman proposal written down, is what we need to do now.
>>>>> 
>>>>> There has been some interesting discussion on the list:
>>>>> 
>>>>> https://lists.w3.org/Archives/Public/public-digipub-ig/2015Dec/0163.html <https://lists.w3.org/Archives/Public/public-digipub-ig/2015Dec/0163.html> (from Daniel Weck)
>>>>> https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0095.html <https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0095.html>  (from Ángel González)
>>>>> 
>>>>> Ivan suggests that we need to write down:
>>>>> 
>>>>> - what should a GET return for a locator (something which is or either refers to a manifest in the abstract sense)
>>>>> - what should a manifest, conceptually, include. At this moment, I see
>>>>>                 - an *identifier*
>>>>>                 - a mapping from absolute URL-s to relative URL-s (where relative means relative to the PWP instance URL)
>>>>>                 - a mapping from relative URL-s to absolute URL-s
>>>>> 
>>>>> Could somebody volunteer to draft a strawman proposal that we can use for the basis of discussion going forward?
>>>>> 
>>>>> --Bill
>>>>> 
>>>>> Bill Kasdorf
>>>>> Vice President, Apex Content Solutions
>>>>> Apex CoVantage
>>>>> W: +1 734-904-6252
>>>>> M: +1 734-904-6252
>>>>> @BillKasdorf <http://twitter.com/#!/BillKasdorf>
>>>>> bkasdorf@apexcovantage.com <x-msg://17/bkasdorf@apexcovantage.com>
>>>>> http://isni.org/isni/0000000116490786 <http://isni.org/isni/0000000116490786>
>>>>> https://orcid.org/0000-0001-7002-4786 <https://orcid.org/0000-0001-7002-4786?lang=en>
>>>>> www.apexcovantage.com <http://www.apexcovantage.com/>
>>>>> 
>>>>> <image001.jpg>
>>>>> 
>>>>> <image001.jpg>
>>>> 
>>>> 
>>>> ----
>>>> Ivan Herman, W3C
>>>> Digital Publishing Lead
>>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>>> mobile: +31-641044153
>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Lead
>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>

----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Tuesday, 26 January 2016 08:50:52 UTC