Re: [dpub-loc] 20160217 minutes from Leonard Rosenthol on 2016-02-18 (public-digipub-ig@w3.org from February 2016)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Thu, 18 Feb 2016 14:14:46 +0000
To: Daniel Weck <daniel.weck@gmail.com>, Ivan Herman <ivan@w3.org>
CC: Romain <rdeltour@gmail.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <6A3080C9-469D-4C6F-B442-9691CE28A49A@adobe.com>
Great example! GitHub is (in this context) a smart server that implements a rich REST API.  And in fact, their API uses the same content negotiation model that Ivan and I are proposing (as one option).   See <https://developer.github.com/v3/media/> for a discussion of how they use media types and Accept header.

So yes, if there is a PWP-aware server that is hosting the content, then it can do any/all of the things that you’ve suggested. I don’t think we have a preference (as yet) for which of the many choices are available.

This is also true for trying to reach inside of the PWP (eg. https://domain.com/another/path/to/book1/manifest.json) unless it just so happens to be stored unpacked (on either a smart or dumb server).  Because if it was stored packed - you’d want that same URL to work too - but that would also require the smart server.

But if all you have is a dumb server - one that you can’t configure (eg. DropBox or Google Drive) - then we need to put all the smarts into the client to be able to handle all the possible permutations.   And maybe that’s OK if this is the expected case - but I hope not if we really want PWP to be a “native” part of the web (which includes both clients AND servers).

Leonard



On 2/18/16, 8:04 AM, "Daniel Weck" <daniel.weck@gmail.com> wrote:

>Hello,
>
>here's a concrete example (unrelated to PWP) which I think illustrates
>the comments made during the concall, regarding content negotiation
>vs. dereferencing URL endpoints to "meta" data about the publication
>locators for unpacked / packed states.
>
>Let's consider the GitHub HTTP API, the w3c/dpub-pwp-loc GitHub
>repository, and the README.md file located at the root of the
>gh-branch. There's a "canonical" URL for that (you can safely click on
>the links below):
>
>curl --head https://api.github.com/repos/w3c/dpub-pwp-loc/readme

>==> Content-Type: application/json; charset=utf-8
>
>curl https://api.github.com/repos/w3c/dpub-pwp-loc/readme

>==> "url": "https://api.github.com/repos/w3c/dpub-pwp-loc/contents/README.md?ref=gh-pages"
>
>As a consumer of that JSON-based API, I can query the actual payload
>that I'm interested in:
>curl https://api.github.com/repos/w3c/dpub-pwp-loc/contents/README.md?ref=gh-pages

>==> "content": "BASE64"
>
>
>Now, back to PWP:
>
>State-agnostic "canonical" URL:
>https://domain.com/path/to/book1

>(note that this could also be a totally different syntax, e.g.
>https://domain.com/info/?get=book1 or
>https://domain.com/book1?get=info etc. for as long as a request
>returns a content-type that a PWP processor / reading-system can
>consume, e.g. application/json or application/pwp-info+json ... or XML
>/ whatever)
>A simple request to this URL could return (minimal JSON example, just
>for illustration purposes):
>{
>    "packed": "https://domain.com/path/to/book1.pwp",
>    "unpacked":
>"https://domain.com/another/path/to/book1/manifest.json"  /// (or
>container.xml, or package.opf ... :)
>}
>
>Once again, there is no naming convention / constraint on the "packed"
>URL https://domain.com/path/to/book1.pwp which could be
>https://domain.com/download/book1 or
>https://download.domain.com/?get=book1 , as long as a request returns
>a payload with content-type application/pwp+zip (for example). Note
>that the book1.pwp archive in my example would contain the "main entry
>point" manifest.json (thus why I made a parallel above with EPUB
>container.xml or package.opf)
>
>The "unpacked" URL path
>https://domain.com/another/path/to/book1/manifest.json does not have
>to represent the actual file structure on the server, but it's a
>useful syntactical convention because other resource files in the PWP
>would probably have similarly-rooted relative locator paths (against a
>given base href), e.g.:
>https://domain.com/another/path/to/book1/index.html

>https://domain.com/another/path/to/book1/images/logo.png

>In other words, if the packed book1.pwp contains index.html with <img
>src="./images/logo.png" />, it does make sense for the online unpacked
>state to use the same path references (as per the example URLs above).
>Publishers may have the option to route URLs any way they like, e.g.
><img src="?get_image=logo.png" />, but we know there is the issue of
>mapping document URLs in packed/unpacked states with some canonical
>locator, so that annotation targets can be referenced and resolved
>consistently. So it would greatly help if the file structure inside
>the packed book1.pwp was replicated exactly in the URL patterns used
>for deploying the unpacked state.
>
>To conclude, I am probably missing something (Ivan and Leonard, you
>guys are ahead of the curve compared to me), but I hope I managed to
>convey useful arguments. Personally, as a developer involved in
>reading-system implementations, and as someone who would like to
>continue deploying content with minimal server-side requirements, I am
>not yet convinced that content negotiation is needed here. As an
>optional feature, sure, but not as the lowest common denominator.
>
>Thanks for listening :)
>Regards, Dan
>
>
>
>On Thu, Feb 18, 2016 at 12:04 PM, Ivan Herman <ivan@w3.org> wrote:
>> With the caveat that the minutes are always difficult to read (Romain, that
>> is not your fault, it is the case for most of the minutes; I know only a few
>> people who write perfect minutes, and I am certainly not among them) maybe
>> some comments on my side. More about this next time we can all talk
>> (although it seems that this will only be in two weeks, due to the Baltimore
>> EDUPUB meeting).
>>
>> First of all, this comment:
>>
>> [[[
>> rom: my issue is that the spec doesn't say "if Lu exists then L must be Lu",
>> I think we should consider it
>> ]]]
>>
>> I do not see why we should say anything like that. It is of course correct
>> that, in many cases, it makes a lot of sense to have Lu=L. But I do not see
>> why we should restrict it this way. In general, the approach I tried to
>> follow in my writeup is to be as permissive as possible and put the minimum
>> possible hard requirements on the locator setup. It is probably worth adding
>> a note in the text (or the more final text) that Lu may be equal to L (in
>> fact, this may very well be a widely used approach) but I would not want to
>> go beyond that.
>>
>> Then there is the whole issue about content negotiations… It seems that we
>> have a disagreement on the value and usage of content negotiations. I do not
>> agree with Daniel's statement that "in a RESTful API the URL would
>> consistently return the same content type". It is certainly not the
>> practice, nor should it be. Content negotiation is widely used when tools
>> want to retrieve, for example the best syntax that encodes a particular
>> information (typical example is in RDF land, where tools may or may not have
>> parsers for a particular RDF serialization), this is how dbpedia is set up
>> etc. (I did told you about the way RDF namespace documents are set up on our
>> site, for example. It is pretty much general practice to do that.) I must
>> admit I also do not agree with Daniel's remark on "content negotiation based
>> on (sophisticated) HTTP headers sounds counter intuitive". Content
>> negotiations is certainly very intuitive to me...
>>
>> All that being said, and that is where maybe there is actually a minor
>> disagreement between Leonard and I: I do not say that content negotiation is
>> the only approach to set up a server storage. The text I wrote is
>> deliberately open ended insofar as it described what the client expectation
>> is when that GET request is issued in general terms, and the choice among
>> the various alternatives are all the server's. The list of possible server
>> behaviours in the text are possible alternatives, instead of hard
>> requirements. The client is responsible in following the various possible
>> paths and, maybe, we will have to describe those possibilities later in more
>> details (precise usage of the LINK header, the <link> element, media types,
>> etc), but that gives the liberty to set up the server the way the publisher
>> wants. If we accept this approach, ie, that the client has some complexity
>> to resolve in favour of a variety of possible server setups, then I do not
>> think there is a major disagreement among us.
>>
>> Talk to you guys later…
>>
>> Ivan
>>
>> B.t.w., a more general and slightly philosophical comment: we should not be
>> afraid of really using HTTP:-) The various header information in both the
>> request and response headers of an HTTP request/response are very rich and
>> sophisticated. There are many situations, on expiration dates, on security,
>> etc, and of course content negotiations that can be expressed via these HTTP
>> headers, and we should not shy away using those whenever we can and it makes
>> sense. As I showed in one of may mails it is not that complex to set up
>> (actually, and to be fair, setting up content negotiations is probably the
>> more complex thing, I accept that).
>>
>> If you are interested by the various possibilities, this site may be of
>> interest:
>>
>> https://github.com/dret/sedola/blob/master/MD/headers.md

>>
>>
>>
>> On 18 Feb 2016, at 09:24, Romain <rdeltour@gmail.com> wrote:
>>
>>
>> On 18 Feb 2016, at 02:49, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>>
>> Actually, the big issue that I took away from the minutes is that ivan and I
>> are in agreement that content negotiation (via standard web technique incl.
>> the Accept header) is the proper way for the client & server to decide what
>> to return on the GET from the canonical locator.   Daniel, however, appears
>> (from the minutes) to be promoting a completely different approach.
>>
>>
>> As stated before [1], I am absolutely not convinced that content negotiation
>> is a good approach.
>> I want to upload a PWP tomorrow to a static file hosting service; if conneg
>> is required I can't do that.
>>
>> More to the point: how to you GET the (manifest + Lu + Lp) info with the
>> conneg solution? Maybe I just miss something.
>>
>> Finally, may I turn the question the other way around: what are the benefits
>> of content negotiation for the canonical locator? (compared to an
>> alternative approach with explicit links in the GET answer (headers or
>> payload).
>>
>> Thanks,
>> Romain.
>>
>> [1] https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0136.html

>>
>>
>> Daniel, if you can explain why you want to do something different from the
>> standard web/REST model, I’d like to understand.
>>
>> Leonard
>>
>> From: Romain <rdeltour@gmail.com>
>> Date: Wednesday, February 17, 2016 at 6:26 PM
>> To: Daniel Weck <daniel.weck@gmail.com>, Leonard Rosenthol
>> <lrosenth@adobe.com>
>> Cc: "DPUB mailing list (public-digipub-ig@w3.org)"
>> <public-digipub-ig@w3.org>, Tzviya Siegman <tsiegman@wiley.com>
>> Subject: Re: [dpub-loc] 20160217 minutes
>>
>> On 17 Feb 2016, at 23:12, Daniel Weck <daniel.weck@gmail.com> wrote:
>>
>> Hi Leonard, that's quite a bold statement, but I suspect the minutes could
>> do with a few corrections.
>>
>> My bad if the minutes are inaccurate, please feel free to amend. It was a
>> bit frustrating too: several times I wanted to talk or precise a point but
>> was busy typing.
>>
>> At any rate, I look forward to the recap from you and Ivan at the next
>> opportunity. PS: it was a small quorum on this concall, but I was under the
>> impression that the participants agreed on the broad lines of your proposal,
>> with only details to clarify.
>>
>> My impression is that participants generally agreed with the presentation of
>> the issues and some principles. I believe that the main point that is still
>> controversial is really what should be the answer to a GET on the canonical
>> locator.
>>
>>> I think we need to go do this over again next week – which si extremely
>>> unfortunate.
>>
>>
>> If I'm not mistaken Matt, Markus, Tzviya and I won't be able to attend
>> (EDUPUB summit).
>>
>> Romain.
>>
>> Regards, Daniel
>>
>> On 17 Feb 2016 9:17 p.m., "Leonard Rosenthol" <lrosenth@adobe.com> wrote:
>>>
>>> Sorry that I was unable to attend today, especially since the discussion
>>> (based on the minutes) seems to completely undo all the work that Ivan,
>>> myself and others did on the mailing list during the past week.   The
>>> position presented by Daniel is the exact opposite of what Ivan’s musings
>>> (adjusted based on mail conversations) presented.
>>>
>>> I think we need to go do this over again next week – which si extremely
>>> unfortunate.
>>>
>>> Leonard
>>>
>>> Fro  "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com>
>>> Date: Wednesday, February 17, 2016 at 11:46 AM
>>> To: "DPUB mailing list (public-digipub-ig@w3.org)"
>>> <public-digipub-ig@w3.org>
>>> Subject: [dpub-loc] 20160217 minutes
>>> Resent-From: <public-digipub-ig@w3.org>
>>> Resent-Date: Wednesday, February 17, 2016 at 11:48 AM
>>>
>>> Minutes from today’s meeting:
>>> https://www.w3.org/2016/02/17-dpub-loc-minutes.html

>>>
>>> Tzviya Siegman
>>> Digital Book Standards & Capabilities Lead
>>> Wiley
>>> 201-748-6884
>>> tsiegman@wiley.com
>>>
>>
>>
>>
>>
>>
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Lead
>> Home: http://www.w3.org/People/Ivan/

>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704

>>
>>
>>
>>
Received on Thursday, 18 February 2016 14:15:20 UTC