Re: [DPUB][Locators]Cancellation and Next Steps

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Tue, 26 Jan 2016 12:03:22 +0000
To: Romain <rdeltour@gmail.com>, Ivan Herman <ivan@w3.org>
CC: Bill Kasdorf <bkasdorf@apexcovantage.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <7C8088FF-7B8D-4992-B166-EB00ED5291BC@adobe.com>
>- more conceptually, I would argue that the unpacked content served at https://example.org/ThePublication isn't semantically equivalent
>(in the context of an HTTP request) to the packaged version at https://example.org/ThePublication?packed.

Which is EXACTLY what I have been arguing for a while now as well!    Thank you Romain!


From: Romain <rdeltour@gmail.com<mailto:rdeltour@gmail.com>>
Date: Tuesday, January 26, 2016 at 4:44 AM
To: Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>>
Cc: Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>>, Bill Kasdorf <bkasdorf@apexcovantage.com<mailto:bkasdorf@apexcovantage.com>>, W3C Digital Publishing IG <public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>>
Subject: Re: [DPUB][Locators]Cancellation and Next Steps

On 26 Jan 2016, at 09:50, Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>> wrote:

On 25 Jan 2016, at 16:48, Romain <rdeltour@gmail.com<mailto:rdeltour@gmail.com>> wrote:

The problem with conneg is that there are many forms and it's sometimes difficult to understand which one we're talking about :)
Here I was assuming you were considering server-driven conneg (aka proactive conneg) where the client sends Accept headers and the server returns the best-match answer. We shouldn't do this.
It appears instead that you're considering an agent-driven conneg pattern (aka reactive conneg); but then, this can take many forms.

Actually… no. In your description what I was considering is what you call server-driven conneg. Note that the client does not have to send an Accept header; the server can set up a default priority.

I am not sure why you say 'we shouldn't do this'? What is your argument?

There are many reasons, most described in the RFC2616 for HTTP/1.1 [1].

- proactive conneg requires clients to set HTTP headers. It may work well for an API with dedicated clients, but browsers won't add PWP-aware headers to all requests they make, so in the general case of browsing the web we have to cater for a reactive conneg pattern anyway.
- proactive conneg is problematic for caches.
- more conceptually, I would argue that the unpacked content served at https://example.org/ThePublication isn't semantically equivalent (in the context of an HTTP request) to the packaged version at https://example.org/ThePublication?packed. In the former case, the GET returns an HTML doc with links to subsequent resources, in the latter case, it returns *all* the resources. It's just not the same beast.

See also the literature in the following refs:

[1] server-driven negtciation in HTTP/1.1 https://tools.ietf.org/html/rfc2616#section-12.1

[2] TBL's design issue on conneg https://www.w3.org/DesignIssues/Conneg The "When not to use conneg" section applies to my 3d bullet above.
[3] WHATWG's Why Not Conneg (by Henri Sivonen) https://wiki.whatwg.org/wiki/Why_not_conneg

[4] 2006 TAG's finding "On Linking Alternative Representations To Enable Discovery And Publishing" https://www.w3.org/2001/tag/doc/alternatives-discovery.html

[5] Roy Fielding's comments on  ietf-http-wg https://lists.w3.org/Archives/Public/ietf-http-wg/2013JanMar/0811.html

All that considered, I think that proactive conneg is not the best approach in our case. But I'm not an expert. Maybe we can reach out to TAG or HTTP fellows?


Note that this is pretty much what we do when serving namespace documents for RDF vocabularies, for example, in http://www.w3.org/ns/ (or elsewhere). The same vocabulary is defined in various formats (Turtle, RDF/XML, JSON-LD, possibly HTML+RDFa), we set up a conneg structure on the server side (using 'var' files), and we even set a priority among the various formats (e.g., pushing the priority for the RDF/XML serialization fairly low compared to Turtle). It is a bit of an administration on the server side, but works well for clients.

We could adopt something like that. The important point is that http://ex.org/ThePublication is the 'canonical' URL for the publication; it is not the same as the identifier, because it binds to a specific place on a specific server (and may change if I make a copy to myself), but it is, sort of, canonical nevertheless. More importantly, this is the URL that is considered to be the 'base' when considering locators within the document.


However: why couldn't we just say that the URL to the unpacked state is the canonical URL?

The user goes to https://example.org/ThePublication with a browser, she can read the publication (unpacked) as any other web content. It's basically a static web site.
The GET answer to this URL returns HTML content, which in turns has links to the a manifest and an alternative packaged version.

This is reactive conneg, although totally user-driven (machine readable with link rel attributes).

You are right, that could work, too (although the two conneg-s do not necessarily exclude one another). The only downside is: what happens if the unpacked version is not available, only the packed one?

I think the core issue is about what you said earlier:
I think for a number of potential applications (like, for example, assigning annotations) it is important to have the same URL for the various formats ('states', as we referred to them).

I think we need to better describe these use cases.

That is clearly the case. We will have to talk to the editor of the PWP Use Case document:-)

Is there another use case than annotations? Isn't annotation more affected by the uniqueness of identifiers than by URLs?
In any case, I think that the pattern where the packed version is linked from the canonical unpacked version would still work.

yes, it definitely would.

One aspect that I still don't picture clearly is that we say that (1) having the same URL for both versions is important and at the same time that (2) the URL changes as soon as the publication is effectively ported.

We have to differentiate, in my view, between the identifier (that does not change if the document is ported) and the locator that identifies a specific instance in two different states. So yes, we do have, maybe, a three-tier situation instead of two. But we agreed that the identifier part is out of our control and charter




I think we agree that whatever is returned, it should give *an access* (in the conceptual sense) to a manifest (and we would not have to go into the syntax of the manifest here).

Yes, possibly with some level of indirection.


The return may be

- The full actual data if there is a packaged form;


the HTTP return header MAY (SHOULD?) also return a link to a manifest

Are you thinking of using a standard HTTP header or defining a custom one?

Standard HTTP. Formally, we can have something like

LINK: <http://url.to.the.manifest<http://url.to.the.manifest/>>; rel=<http://identifies.the.manifest.format<http://identifies.the.manifest.format/>>

The details must be clarified, but I believe that is a correct approach standard-wise.

but the package MUST contain a manifest in any case (we have to decide which of the manifest have priority).

If the package format allows a client to easily retrieve the manifest, this is probably enough. Why use an HTTP header in addition?
I'd suggest that making the manifest easily and deterministically retrievable from the package becomes a requirement.

I could imagine that I make a local copy of a publication in a package, which includes the author-provided manifest, but I would like to add some additional information valid for my site only (eg, other redirections) that I 'attach' to the publication through the GET without modifying the package. Ie, both can be useful.

- The manifest itself, e.g., in JSON format if that is indeed the syntax we adopt, or an HTML file that includes the manifest, or an HTML file that links to the manifest

yes, for the unpackaged state.






On the manifest question, I think that the discussion taking place for EPUB about a JSON-based manifest may be useful here as there is definitely overlap in the organization and structure of that material that we would also want here.  And if we could potentially align these two efforts to a single manifest format, then it would make it trivial for implementations to author and provide it (no transcoding required).   But yes, there would need to be more stuff from PWP’s perspective (such as the optional mapping for external resources)


From: Bill Kasdorf <bkasdorf@apexcovantage.com<mailto:bkasdorf@apexcovantage.com>>
Date: Wednesday, January 20, 2016 at 9:19 AM
To: "public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>" <public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>>
Subject: [DPUB][Locators]Cancellation and Next Steps



Hi, folks—

Today's Locators Task Force meeting is cancelled, but our Task is not. ;-)

It has been suggested by several people that focusing on the actual structure of the locator, and getting a strawman proposal written down, is what we need to do now.

There has been some interesting discussion on the list:

https://lists.w3.org/Archives/Public/public-digipub-ig/2015Dec/0163.html (from Daniel Weck)
https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0095.html  (from Ángel González)

Ivan suggests that we need to write down:

- what should a GET return for a locator (something which is or either refers to a manifest in the abstract sense)
- what should a manifest, conceptually, include. At this moment, I see
                - an *identifier*
                - a mapping from absolute URL-s to relative URL-s (where relative means relative to the PWP instance URL)
                - a mapping from relative URL-s to absolute URL-s

Could somebody volunteer to draft a strawman proposal that we can use for the basis of discussion going forward?


Bill Kasdorf
Vice President, Apex Content Solutions
Apex CoVantage
W: +1 734-904-6252
M: +1 734-904-6252




Received on Tuesday, 26 January 2016 12:03:59 UTC

