Re: [DPUB][Locators]Cancellation and Next Steps from Romain on 2016-01-25 (public-digipub-ig@w3.org from January 2016)

From: Romain <rdeltour@gmail.com>
Date: Mon, 25 Jan 2016 16:48:08 +0100
To: Ivan Herman <ivan@w3.org>
Cc: Leonard Rosenthol <lrosenth@adobe.com>, Bill Kasdorf <bkasdorf@apexcovantage.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-Id: <C846FDF9-2FB3-4483-86CF-C7119B8F07DA@gmail.com>
> On 25 Jan 2016, at 11:32, Ivan Herman <ivan@w3.org> wrote:
> 
>> 
>> On 21 Jan 2016, at 18:10, Romain <rdeltour@gmail.com <mailto:rdeltour@gmail.com>> wrote:
>> 
>> 
>>> On 21 Jan 2016, at 07:28, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>> 
>>> I think that is where HTTP content negotiation should come in to the picture in my view.
>>> (snip)
>>> The only thing this mechanism requires is to have a distinct media type assigned to a PWP (akin to the media type for EPUB) and, probably, to the manifest. If that is there, a client may express the media types it accepts, and even the relative priority of what format it prefers (if the client has several)
>>> 
>>> This allows for a setup where there is *no* packaged form around. And content negotiation is a mechanism implemented by all servers and clients these days, so we should just use it…
>> 
>> I'm not convinced by the content negotiation approach. The format to get (expanded vs. packaged) may depend on the user's intent (e.g. whether she wants to start reading the book or download it to share on a USB key); the decision is not necessarily under the responsibility of client software.
>> I believe it would be easier if the two formats were represented by two different URLs, which has the benefit of working in run-of-the-mill browsers.
> 
> I think for a number of potential applications (like, for example, assigning annotations) it is important to have the same URL for the various formats ('states', as we referred to them).
> 
> The 'usual' approach taken by content negotiations is something like:
> 
> - http://ex.org/ThePublication <http://ex.org/ThePublication>  - is the URL of the resources
> - http://ex.org/ThePublication.pack <http://ex.org/ThePublication.pack> - is the URl of the packaged version (if any)
> - http://ex.org/ThePublication.unpack <http://ex.org/ThePublication.unpack> - is the URI of the unpacked version (if any)
> 
> The client uses the first URL with some preferences to get to either the packaged file or the directly to the document on the Web. Explicit addressing of the, say, package is there if one wants to copy the file on a USB stick.

The problem with conneg is that there are many forms and it's sometimes difficult to understand which one we're talking about :)
Here I was assuming you were considering server-driven conneg (aka proactive conneg) where the client sends Accept headers and the server returns the best-match answer. We shouldn't do this.
It appears instead that you're considering an agent-driven conneg pattern (aka reactive conneg); but then, this can take many forms.

> 
> We could adopt something like that. The important point is that http://ex.org/ThePublication <http://ex.org/ThePublication> is the 'canonical' URL for the publication; it is not the same as the identifier, because it binds to a specific place on a specific server (and may change if I make a copy to myself), but it is, sort of, canonical nevertheless. More importantly, this is the URL that is considered to be the 'base' when considering locators within the document.

Agreed.

However: why couldn't we just say that the URL to the unpacked state is the canonical URL?

The user goes to https://example.org/ThePublication <https://example.org/ThePublication> with a browser, she can read the publication (unpacked) as any other web content. It's basically a static web site.
The GET answer to this URL returns HTML content, which in turns has links to the a manifest and an alternative packaged version.

This is reactive conneg, although totally user-driven (machine readable with link rel attributes).

I think the core issue is about what you said earlier:
> I think for a number of potential applications (like, for example, assigning annotations) it is important to have the same URL for the various formats ('states', as we referred to them).

I think we need to better describe these use cases. Is there another use case than annotations? Isn't annotation more affected by the uniqueness of identifiers than by URLs?
In any case, I think that the pattern where the packed version is linked from the canonical unpacked version would still work.

One aspect that I still don't picture clearly is that we say that (1) having the same URL for both versions is important and at the same time that (2) the URL changes as soon as the publication is effectively ported.


Romain.

> 
>> 
>>> I think we agree that whatever is returned, it should give *an access* (in the conceptual sense) to a manifest (and we would not have to go into the syntax of the manifest here).
>> 
>> Yes, possibly with some level of indirection.
> 
> Correct
> 
>> 
>>> The return may be
>>> 
>>> - The full actual data if there is a packaged form;
>> 
>> OK.
>> 
>>> the HTTP return header MAY (SHOULD?) also return a link to a manifest
>> 
>> Are you thinking of using a standard HTTP header or defining a custom one?
>> 
> 
> Standard HTTP. Formally, we can have something like
> 
> LINK: <http://url.to.the.manifest <http://url.to.the.manifest/>>; rel=<http://identifies.the.manifest.format <http://identifies.the.manifest.format/>>
> 
> The details must be clarified, but I believe that is a correct approach standard-wise.
> 
> 
>>> but the package MUST contain a manifest in any case (we have to decide which of the manifest have priority).
>> 
>> If the package format allows a client to easily retrieve the manifest, this is probably enough. Why use an HTTP header in addition?
>> I'd suggest that making the manifest easily and deterministically retrievable from the package becomes a requirement.
> 
> I could imagine that I make a local copy of a publication in a package, which includes the author-provided manifest, but I would like to add some additional information valid for my site only (eg, other redirections) that I 'attach' to the publication through the GET without modifying the package. Ie, both can be useful.
> 
>> 
>>> - The manifest itself, e.g., in JSON format if that is indeed the syntax we adopt, or an HTML file that includes the manifest, or an HTML file that links to the manifest
>> 
>> yes, for the unpackaged state.
> 
> Correct.
> 
> Thanks
> 
> Ivan
> 
>> 
>> Romain.
>> 
>> 
>>> 
>>> Ivan
>>> 
>>> 
>>> 
>>>> 
>>>> On the manifest question, I think that the discussion taking place for EPUB about a JSON-based manifest may be useful here as there is definitely overlap in the organization and structure of that material that we would also want here.  And if we could potentially align these two efforts to a single manifest format, then it would make it trivial for implementations to author and provide it (no transcoding required).   But yes, there would need to be more stuff from PWP’s perspective (such as the optional mapping for external resources)
>>>> 
>>>> 
>>>> Leonard
>>>> 
>>>> From: Bill Kasdorf <bkasdorf@apexcovantage.com <mailto:bkasdorf@apexcovantage.com>>
>>>> Date: Wednesday, January 20, 2016 at 9:19 AM
>>>> To: "public-digipub-ig@w3.org <mailto:public-digipub-ig@w3.org>" <public-digipub-ig@w3.org <mailto:public-digipub-ig@w3.org>>
>>>> Subject: [DPUB][Locators]Cancellation and Next Steps
>>>> Resent-From: <public-digipub-ig@w3.org <mailto:public-digipub-ig@w3.org>>
>>>> Resent-Date: Wednesday, January 20, 2016 at 9:20 AM
>>>> 
>>>> Hi, folks—
>>>> 
>>>> Today's Locators Task Force meeting is cancelled, but our Task is not. ;-)
>>>> 
>>>> It has been suggested by several people that focusing on the actual structure of the locator, and getting a strawman proposal written down, is what we need to do now.
>>>> 
>>>> There has been some interesting discussion on the list:
>>>> 
>>>> https://lists.w3.org/Archives/Public/public-digipub-ig/2015Dec/0163.html <https://lists.w3.org/Archives/Public/public-digipub-ig/2015Dec/0163.html> (from Daniel Weck)
>>>> https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0095.html <https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0095.html>  (from Ángel González)
>>>> 
>>>> Ivan suggests that we need to write down:
>>>> 
>>>> - what should a GET return for a locator (something which is or either refers to a manifest in the abstract sense)
>>>> - what should a manifest, conceptually, include. At this moment, I see
>>>>                 - an *identifier*
>>>>                 - a mapping from absolute URL-s to relative URL-s (where relative means relative to the PWP instance URL)
>>>>                 - a mapping from relative URL-s to absolute URL-s
>>>> 
>>>> Could somebody volunteer to draft a strawman proposal that we can use for the basis of discussion going forward?
>>>> 
>>>> --Bill
>>>> 
>>>> Bill Kasdorf
>>>> Vice President, Apex Content Solutions
>>>> Apex CoVantage
>>>> W: +1 734-904-6252
>>>> M: +1 734-904-6252
>>>> @BillKasdorf <http://twitter.com/#!/BillKasdorf>
>>>> bkasdorf@apexcovantage.com <x-msg://17/bkasdorf@apexcovantage.com>
>>>> http://isni.org/isni/0000000116490786 <http://isni.org/isni/0000000116490786>
>>>> https://orcid.org/0000-0001-7002-4786 <https://orcid.org/0000-0001-7002-4786?lang=en>
>>>> www.apexcovantage.com <http://www.apexcovantage.com/>
>>>> 
>>>> <image001.jpg>
>>>> 
>>>> <image001.jpg>
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Lead
>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>> mobile: +31-641044153
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>> 
>>> 
>>> 
>>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
Received on Monday, 25 January 2016 15:48:43 UTC