Re: [dpub-loc] 20160217 minutes from Daniel Weck on 2016-02-18 (public-digipub-ig@w3.org from February 2016)

From: Daniel Weck <daniel.weck@gmail.com>
Date: Thu, 18 Feb 2016 13:04:34 +0000
To: Ivan Herman <ivan@w3.org>
Cc: Romain <rdeltour@gmail.com>, Leonard Rosenthol <lrosenth@adobe.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <CA+FkZ9G5sYg1c=pLLna5LqR2HSQNWZSpbC_HTayKPqwH1ffDTg@mail.gmail.com>
Hello,

here's a concrete example (unrelated to PWP) which I think illustrates
the comments made during the concall, regarding content negotiation
vs. dereferencing URL endpoints to "meta" data about the publication
locators for unpacked / packed states.

Let's consider the GitHub HTTP API, the w3c/dpub-pwp-loc GitHub
repository, and the README.md file located at the root of the
gh-branch. There's a "canonical" URL for that (you can safely click on
the links below):

curl --head https://api.github.com/repos/w3c/dpub-pwp-loc/readme
==> Content-Type: application/json; charset=utf-8

curl https://api.github.com/repos/w3c/dpub-pwp-loc/readme
==> "url": "https://api.github.com/repos/w3c/dpub-pwp-loc/contents/README.md?ref=gh-pages"

As a consumer of that JSON-based API, I can query the actual payload
that I'm interested in:
curl https://api.github.com/repos/w3c/dpub-pwp-loc/contents/README.md?ref=gh-pages
==> "content": "BASE64"


Now, back to PWP:

State-agnostic "canonical" URL:
https://domain.com/path/to/book1
(note that this could also be a totally different syntax, e.g.
https://domain.com/info/?get=book1 or
https://domain.com/book1?get=info etc. for as long as a request
returns a content-type that a PWP processor / reading-system can
consume, e.g. application/json or application/pwp-info+json ... or XML
/ whatever)
A simple request to this URL could return (minimal JSON example, just
for illustration purposes):
{
    "packed": "https://domain.com/path/to/book1.pwp",
    "unpacked":
"https://domain.com/another/path/to/book1/manifest.json"  /// (or
container.xml, or package.opf ... :)
}

Once again, there is no naming convention / constraint on the "packed"
URL https://domain.com/path/to/book1.pwp which could be
https://domain.com/download/book1 or
https://download.domain.com/?get=book1 , as long as a request returns
a payload with content-type application/pwp+zip (for example). Note
that the book1.pwp archive in my example would contain the "main entry
point" manifest.json (thus why I made a parallel above with EPUB
container.xml or package.opf)

The "unpacked" URL path
https://domain.com/another/path/to/book1/manifest.json does not have
to represent the actual file structure on the server, but it's a
useful syntactical convention because other resource files in the PWP
would probably have similarly-rooted relative locator paths (against a
given base href), e.g.:
https://domain.com/another/path/to/book1/index.html
https://domain.com/another/path/to/book1/images/logo.png
In other words, if the packed book1.pwp contains index.html with <img
src="./images/logo.png" />, it does make sense for the online unpacked
state to use the same path references (as per the example URLs above).
Publishers may have the option to route URLs any way they like, e.g.
<img src="?get_image=logo.png" />, but we know there is the issue of
mapping document URLs in packed/unpacked states with some canonical
locator, so that annotation targets can be referenced and resolved
consistently. So it would greatly help if the file structure inside
the packed book1.pwp was replicated exactly in the URL patterns used
for deploying the unpacked state.

To conclude, I am probably missing something (Ivan and Leonard, you
guys are ahead of the curve compared to me), but I hope I managed to
convey useful arguments. Personally, as a developer involved in
reading-system implementations, and as someone who would like to
continue deploying content with minimal server-side requirements, I am
not yet convinced that content negotiation is needed here. As an
optional feature, sure, but not as the lowest common denominator.

Thanks for listening :)
Regards, Dan



On Thu, Feb 18, 2016 at 12:04 PM, Ivan Herman <ivan@w3.org> wrote:
> With the caveat that the minutes are always difficult to read (Romain, that
> is not your fault, it is the case for most of the minutes; I know only a few
> people who write perfect minutes, and I am certainly not among them) maybe
> some comments on my side. More about this next time we can all talk
> (although it seems that this will only be in two weeks, due to the Baltimore
> EDUPUB meeting).
>
> First of all, this comment:
>
> [[[
> rom: my issue is that the spec doesn't say "if Lu exists then L must be Lu",
> I think we should consider it
> ]]]
>
> I do not see why we should say anything like that. It is of course correct
> that, in many cases, it makes a lot of sense to have Lu=L. But I do not see
> why we should restrict it this way. In general, the approach I tried to
> follow in my writeup is to be as permissive as possible and put the minimum
> possible hard requirements on the locator setup. It is probably worth adding
> a note in the text (or the more final text) that Lu may be equal to L (in
> fact, this may very well be a widely used approach) but I would not want to
> go beyond that.
>
> Then there is the whole issue about content negotiations… It seems that we
> have a disagreement on the value and usage of content negotiations. I do not
> agree with Daniel's statement that "in a RESTful API the URL would
> consistently return the same content type". It is certainly not the
> practice, nor should it be. Content negotiation is widely used when tools
> want to retrieve, for example the best syntax that encodes a particular
> information (typical example is in RDF land, where tools may or may not have
> parsers for a particular RDF serialization), this is how dbpedia is set up
> etc. (I did told you about the way RDF namespace documents are set up on our
> site, for example. It is pretty much general practice to do that.) I must
> admit I also do not agree with Daniel's remark on "content negotiation based
> on (sophisticated) HTTP headers sounds counter intuitive". Content
> negotiations is certainly very intuitive to me...
>
> All that being said, and that is where maybe there is actually a minor
> disagreement between Leonard and I: I do not say that content negotiation is
> the only approach to set up a server storage. The text I wrote is
> deliberately open ended insofar as it described what the client expectation
> is when that GET request is issued in general terms, and the choice among
> the various alternatives are all the server's. The list of possible server
> behaviours in the text are possible alternatives, instead of hard
> requirements. The client is responsible in following the various possible
> paths and, maybe, we will have to describe those possibilities later in more
> details (precise usage of the LINK header, the <link> element, media types,
> etc), but that gives the liberty to set up the server the way the publisher
> wants. If we accept this approach, ie, that the client has some complexity
> to resolve in favour of a variety of possible server setups, then I do not
> think there is a major disagreement among us.
>
> Talk to you guys later…
>
> Ivan
>
> B.t.w., a more general and slightly philosophical comment: we should not be
> afraid of really using HTTP:-) The various header information in both the
> request and response headers of an HTTP request/response are very rich and
> sophisticated. There are many situations, on expiration dates, on security,
> etc, and of course content negotiations that can be expressed via these HTTP
> headers, and we should not shy away using those whenever we can and it makes
> sense. As I showed in one of may mails it is not that complex to set up
> (actually, and to be fair, setting up content negotiations is probably the
> more complex thing, I accept that).
>
> If you are interested by the various possibilities, this site may be of
> interest:
>
> https://github.com/dret/sedola/blob/master/MD/headers.md
>
>
>
> On 18 Feb 2016, at 09:24, Romain <rdeltour@gmail.com> wrote:
>
>
> On 18 Feb 2016, at 02:49, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>
> Actually, the big issue that I took away from the minutes is that ivan and I
> are in agreement that content negotiation (via standard web technique incl.
> the Accept header) is the proper way for the client & server to decide what
> to return on the GET from the canonical locator.   Daniel, however, appears
> (from the minutes) to be promoting a completely different approach.
>
>
> As stated before [1], I am absolutely not convinced that content negotiation
> is a good approach.
> I want to upload a PWP tomorrow to a static file hosting service; if conneg
> is required I can't do that.
>
> More to the point: how to you GET the (manifest + Lu + Lp) info with the
> conneg solution? Maybe I just miss something.
>
> Finally, may I turn the question the other way around: what are the benefits
> of content negotiation for the canonical locator? (compared to an
> alternative approach with explicit links in the GET answer (headers or
> payload).
>
> Thanks,
> Romain.
>
> [1] https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0136.html
>
>
> Daniel, if you can explain why you want to do something different from the
> standard web/REST model, I’d like to understand.
>
> Leonard
>
> From: Romain <rdeltour@gmail.com>
> Date: Wednesday, February 17, 2016 at 6:26 PM
> To: Daniel Weck <daniel.weck@gmail.com>, Leonard Rosenthol
> <lrosenth@adobe.com>
> Cc: "DPUB mailing list (public-digipub-ig@w3.org)"
> <public-digipub-ig@w3.org>, Tzviya Siegman <tsiegman@wiley.com>
> Subject: Re: [dpub-loc] 20160217 minutes
>
> On 17 Feb 2016, at 23:12, Daniel Weck <daniel.weck@gmail.com> wrote:
>
> Hi Leonard, that's quite a bold statement, but I suspect the minutes could
> do with a few corrections.
>
> My bad if the minutes are inaccurate, please feel free to amend. It was a
> bit frustrating too: several times I wanted to talk or precise a point but
> was busy typing.
>
> At any rate, I look forward to the recap from you and Ivan at the next
> opportunity. PS: it was a small quorum on this concall, but I was under the
> impression that the participants agreed on the broad lines of your proposal,
> with only details to clarify.
>
> My impression is that participants generally agreed with the presentation of
> the issues and some principles. I believe that the main point that is still
> controversial is really what should be the answer to a GET on the canonical
> locator.
>
>> I think we need to go do this over again next week – which si extremely
>> unfortunate.
>
>
> If I'm not mistaken Matt, Markus, Tzviya and I won't be able to attend
> (EDUPUB summit).
>
> Romain.
>
> Regards, Daniel
>
> On 17 Feb 2016 9:17 p.m., "Leonard Rosenthol" <lrosenth@adobe.com> wrote:
>>
>> Sorry that I was unable to attend today, especially since the discussion
>> (based on the minutes) seems to completely undo all the work that Ivan,
>> myself and others did on the mailing list during the past week.   The
>> position presented by Daniel is the exact opposite of what Ivan’s musings
>> (adjusted based on mail conversations) presented.
>>
>> I think we need to go do this over again next week – which si extremely
>> unfortunate.
>>
>> Leonard
>>
>> Fro  "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com>
>> Date: Wednesday, February 17, 2016 at 11:46 AM
>> To: "DPUB mailing list (public-digipub-ig@w3.org)"
>> <public-digipub-ig@w3.org>
>> Subject: [dpub-loc] 20160217 minutes
>> Resent-From: <public-digipub-ig@w3.org>
>> Resent-Date: Wednesday, February 17, 2016 at 11:48 AM
>>
>> Minutes from today’s meeting:
>> https://www.w3.org/2016/02/17-dpub-loc-minutes.html
>>
>> Tzviya Siegman
>> Digital Book Standards & Capabilities Lead
>> Wiley
>> 201-748-6884
>> tsiegman@wiley.com
>>
>
>
>
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
>
>
>
>
Received on Thursday, 18 February 2016 13:05:27 UTC