Re: [dpub-loc] 20160217 minutes from Daniel Weck on 2016-02-18 (public-digipub-ig@w3.org from February 2016)

From: Daniel Weck <daniel.weck@gmail.com>
Date: Thu, 18 Feb 2016 16:46:28 +0000
To: Leonard Rosenthol <lrosenth@adobe.com>
Cc: "DPUB mailing list (public-digipub-ig@w3.org)" <public-digipub-ig@w3.org>, Herman Ivan <ivan@w3.org>, Romain <rdeltour@gmail.com>
Message-ID: <CA+FkZ9G1TxaGXtsYNOzMjquDYmVOvLoqZJHEoJY2gNMP5VKBmw@mail.gmail.com>
On 18 Feb 2016 2:14 p.m., "Leonard Rosenthol" <lrosenth@adobe.com> wrote:
>
> Great example! GitHub is (in this context) a smart server that implements
a rich REST API.  And in fact, their API uses the same content negotiation
model that Ivan and I are proposing (as one option).   See <
https://developer.github.com/v3/media/> for a discussion of how they use
media types and Accept header.

DANW:
Sure, GitHub is clever. But my example makes a point of illustrating a dumb
curl request :)
(and a simple JSON body payload in the response)

> So yes, if there is a PWP-aware server that is hosting the content, then
it can do any/all of the things that you’ve suggested. I don’t think we
have a preference (as yet) for which of the many choices are available.

DANW:
My example is meant to illustrate that a dumb server can return a static
JSON file, with zero PWP awareness (just URLs and associated
Content-Types). I just want to have the option to: edit a bunch of files
locally on my machine, upload to my HTTP server, profit.

> This is also true for trying to reach inside of the PWP (eg.
https://domain.com/another/path/to/book1/manifest.json) unless it just so
happens to be stored unpacked (on either a smart or dumb server).  Because
if it was stored packed - you’d want that same URL to work too - but that
would also require the smart server.

DANW:
Agreed. As the consumer of the above URL, all I  want is a JSON body
payload as a response to my HTTP request. I do not need/want to know
whether this originates from a static file, a blob from a database, an
inflated data stream from the packed PWP archive, or witchcraft. That's the
API contract, I am agnostic to the "implementation" details.

> But if all you have is a dumb server - one that you can’t configure (eg.
DropBox or Google Drive) - then we need to put all the smarts into the
client to be able to handle all the possible permutations.   And maybe
that’s OK if this is the expected case - but I hope not if we really want
PWP to be a “native” part of the web (which includes both clients AND
servers).

DANW:
Great example. I want to be able to edit unpacked PWP contents in my local
DropBox folder, and have my users / consumers access this content over
HTTP. As a matter of fact, we already do this with exploded EPUBs, OPDS,
using a reading system that just fetches typed payloads from simple URL
requests (DropBox is configured with CORS headers by default, so this helps
a bit). No content negotiation of any kind, by the way.

Is this out of scope?

Thanks! dan

>
>
>
> On 2/18/16, 8:04 AM, "Daniel Weck" <daniel.weck@gmail.com> wrote:
>
> >Hello,
> >
> >here's a concrete example (unrelated to PWP) which I think illustrates
> >the comments made during the concall, regarding content negotiation
> >vs. dereferencing URL endpoints to "meta" data about the publication
> >locators for unpacked / packed states.
> >
> >Let's consider the GitHub HTTP API, the w3c/dpub-pwp-loc GitHub
> >repository, and the README.md file located at the root of the
> >gh-branch. There's a "canonical" URL for that (you can safely click on
> >the links below):
> >
> >curl --head https://api.github.com/repos/w3c/dpub-pwp-loc/readme
> >==> Content-Type: application/json; charset=utf-8
> >
> >curl https://api.github.com/repos/w3c/dpub-pwp-loc/readme
> >==> "url": "
https://api.github.com/repos/w3c/dpub-pwp-loc/contents/README.md?ref=gh-pages
"
> >
> >As a consumer of that JSON-based API, I can query the actual payload
> >that I'm interested in:
> >curl
https://api.github.com/repos/w3c/dpub-pwp-loc/contents/README.md?ref=gh-pages
> >==> "content": "BASE64"
> >
> >
> >Now, back to PWP:
> >
> >State-agnostic "canonical" URL:
> >https://domain.com/path/to/book1
> >(note that this could also be a totally different syntax, e.g.
> >https://domain.com/info/?get=book1 or
> >https://domain.com/book1?get=info etc. for as long as a request
> >returns a content-type that a PWP processor / reading-system can
> >consume, e.g. application/json or application/pwp-info+json ... or XML
> >/ whatever)
> >A simple request to this URL could return (minimal JSON example, just
> >for illustration purposes):
> >{
> >    "packed": "https://domain.com/path/to/book1.pwp",
> >    "unpacked":
> >"https://domain.com/another/path/to/book1/manifest.json"  /// (or
> >container.xml, or package.opf ... :)
> >}
> >
> >Once again, there is no naming convention / constraint on the "packed"
> >URL https://domain.com/path/to/book1.pwp which could be
> >https://domain.com/download/book1 or
> >https://download.domain.com/?get=book1 , as long as a request returns
> >a payload with content-type application/pwp+zip (for example). Note
> >that the book1.pwp archive in my example would contain the "main entry
> >point" manifest.json (thus why I made a parallel above with EPUB
> >container.xml or package.opf)
> >
> >The "unpacked" URL path
> >https://domain.com/another/path/to/book1/manifest.json does not have
> >to represent the actual file structure on the server, but it's a
> >useful syntactical convention because other resource files in the PWP
> >would probably have similarly-rooted relative locator paths (against a
> >given base href), e.g.:
> >https://domain.com/another/path/to/book1/index.html
> >https://domain.com/another/path/to/book1/images/logo.png
> >In other words, if the packed book1.pwp contains index.html with <img
> >src="./images/logo.png" />, it does make sense for the online unpacked
> >state to use the same path references (as per the example URLs above).
> >Publishers may have the option to route URLs any way they like, e.g.
> ><img src="?get_image=logo.png" />, but we know there is the issue of
> >mapping document URLs in packed/unpacked states with some canonical
> >locator, so that annotation targets can be referenced and resolved
> >consistently. So it would greatly help if the file structure inside
> >the packed book1.pwp was replicated exactly in the URL patterns used
> >for deploying the unpacked state.
> >
> >To conclude, I am probably missing something (Ivan and Leonard, you
> >guys are ahead of the curve compared to me), but I hope I managed to
> >convey useful arguments. Personally, as a developer involved in
> >reading-system implementations, and as someone who would like to
> >continue deploying content with minimal server-side requirements, I am
> >not yet convinced that content negotiation is needed here. As an
> >optional feature, sure, but not as the lowest common denominator.
> >
> >Thanks for listening :)
> >Regards, Dan
> >
> >
> >
> >On Thu, Feb 18, 2016 at 12:04 PM, Ivan Herman <ivan@w3.org> wrote:
> >> With the caveat that the minutes are always difficult to read (Romain,
that
> >> is not your fault, it is the case for most of the minutes; I know only
a few
> >> people who write perfect minutes, and I am certainly not among them)
maybe
> >> some comments on my side. More about this next time we can all talk
> >> (although it seems that this will only be in two weeks, due to the
Baltimore
> >> EDUPUB meeting).
> >>
> >> First of all, this comment:
> >>
> >> [[[
> >> rom: my issue is that the spec doesn't say "if Lu exists then L must
be Lu",
> >> I think we should consider it
> >> ]]]
> >>
> >> I do not see why we should say anything like that. It is of course
correct
> >> that, in many cases, it makes a lot of sense to have Lu=L. But I do
not see
> >> why we should restrict it this way. In general, the approach I tried to
> >> follow in my writeup is to be as permissive as possible and put the
minimum
> >> possible hard requirements on the locator setup. It is probably worth
adding
> >> a note in the text (or the more final text) that Lu may be equal to L
(in
> >> fact, this may very well be a widely used approach) but I would not
want to
> >> go beyond that.
> >>
> >> Then there is the whole issue about content negotiations… It seems
that we
> >> have a disagreement on the value and usage of content negotiations. I
do not
> >> agree with Daniel's statement that "in a RESTful API the URL would
> >> consistently return the same content type". It is certainly not the
> >> practice, nor should it be. Content negotiation is widely used when
tools
> >> want to retrieve, for example the best syntax that encodes a particular
> >> information (typical example is in RDF land, where tools may or may
not have
> >> parsers for a particular RDF serialization), this is how dbpedia is
set up
> >> etc. (I did told you about the way RDF namespace documents are set up
on our
> >> site, for example. It is pretty much general practice to do that.) I
must
> >> admit I also do not agree with Daniel's remark on "content negotiation
based
> >> on (sophisticated) HTTP headers sounds counter intuitive". Content
> >> negotiations is certainly very intuitive to me...
> >>
> >> All that being said, and that is where maybe there is actually a minor
> >> disagreement between Leonard and I: I do not say that content
negotiation is
> >> the only approach to set up a server storage. The text I wrote is
> >> deliberately open ended insofar as it described what the client
expectation
> >> is when that GET request is issued in general terms, and the choice
among
> >> the various alternatives are all the server's. The list of possible
server
> >> behaviours in the text are possible alternatives, instead of hard
> >> requirements. The client is responsible in following the various
possible
> >> paths and, maybe, we will have to describe those possibilities later
in more
> >> details (precise usage of the LINK header, the <link> element, media
types,
> >> etc), but that gives the liberty to set up the server the way the
publisher
> >> wants. If we accept this approach, ie, that the client has some
complexity
> >> to resolve in favour of a variety of possible server setups, then I do
not
> >> think there is a major disagreement among us.
> >>
> >> Talk to you guys later…
> >>
> >> Ivan
> >>
> >> B.t.w., a more general and slightly philosophical comment: we should
not be
> >> afraid of really using HTTP:-) The various header information in both
the
> >> request and response headers of an HTTP request/response are very rich
and
> >> sophisticated. There are many situations, on expiration dates, on
security,
> >> etc, and of course content negotiations that can be expressed via
these HTTP
> >> headers, and we should not shy away using those whenever we can and it
makes
> >> sense. As I showed in one of may mails it is not that complex to set up
> >> (actually, and to be fair, setting up content negotiations is probably
the
> >> more complex thing, I accept that).
> >>
> >> If you are interested by the various possibilities, this site may be of
> >> interest:
> >>
> >> https://github.com/dret/sedola/blob/master/MD/headers.md
> >>
> >>
> >>
> >> On 18 Feb 2016, at 09:24, Romain <rdeltour@gmail.com> wrote:
> >>
> >>
> >> On 18 Feb 2016, at 02:49, Leonard Rosenthol <lrosenth@adobe.com> wrote:
> >>
> >> Actually, the big issue that I took away from the minutes is that ivan
and I
> >> are in agreement that content negotiation (via standard web technique
incl.
> >> the Accept header) is the proper way for the client & server to decide
what
> >> to return on the GET from the canonical locator.   Daniel, however,
appears
> >> (from the minutes) to be promoting a completely different approach.
> >>
> >>
> >> As stated before [1], I am absolutely not convinced that content
negotiation
> >> is a good approach.
> >> I want to upload a PWP tomorrow to a static file hosting service; if
conneg
> >> is required I can't do that.
> >>
> >> More to the point: how to you GET the (manifest + Lu + Lp) info with
the
> >> conneg solution? Maybe I just miss something.
> >>
> >> Finally, may I turn the question the other way around: what are the
benefits
> >> of content negotiation for the canonical locator? (compared to an
> >> alternative approach with explicit links in the GET answer (headers or
> >> payload).
> >>
> >> Thanks,
> >> Romain.
> >>
> >> [1]
https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0136.html
> >>
> >>
> >> Daniel, if you can explain why you want to do something different from
the
> >> standard web/REST model, I’d like to understand.
> >>
> >> Leonard
> >>
> >> From: Romain <rdeltour@gmail.com>
> >> Date: Wednesday, February 17, 2016 at 6:26 PM
> >> To: Daniel Weck <daniel.weck@gmail.com>, Leonard Rosenthol
> >> <lrosenth@adobe.com>
> >> Cc: "DPUB mailing list (public-digipub-ig@w3.org)"
> >> <public-digipub-ig@w3.org>, Tzviya Siegman <tsiegman@wiley.com>
> >> Subject: Re: [dpub-loc] 20160217 minutes
> >>
> >> On 17 Feb 2016, at 23:12, Daniel Weck <daniel.weck@gmail.com> wrote:
> >>
> >> Hi Leonard, that's quite a bold statement, but I suspect the minutes
could
> >> do with a few corrections.
> >>
> >> My bad if the minutes are inaccurate, please feel free to amend. It
was a
> >> bit frustrating too: several times I wanted to talk or precise a point
but
> >> was busy typing.
> >>
> >> At any rate, I look forward to the recap from you and Ivan at the next
> >> opportunity. PS: it was a small quorum on this concall, but I was
under the
> >> impression that the participants agreed on the broad lines of your
proposal,
> >> with only details to clarify.
> >>
> >> My impression is that participants generally agreed with the
presentation of
> >> the issues and some principles. I believe that the main point that is
still
> >> controversial is really what should be the answer to a GET on the
canonical
> >> locator.
> >>
> >>> I think we need to go do this over again next week – which si
extremely
> >>> unfortunate.
> >>
> >>
> >> If I'm not mistaken Matt, Markus, Tzviya and I won't be able to attend
> >> (EDUPUB summit).
> >>
> >> Romain.
> >>
> >> Regards, Daniel
> >>
> >> On 17 Feb 2016 9:17 p.m., "Leonard Rosenthol" <lrosenth@adobe.com>
wrote:
> >>>
> >>> Sorry that I was unable to attend today, especially since the
discussion
> >>> (based on the minutes) seems to completely undo all the work that
Ivan,
> >>> myself and others did on the mailing list during the past week.   The
> >>> position presented by Daniel is the exact opposite of what Ivan’s
musings
> >>> (adjusted based on mail conversations) presented.
> >>>
> >>> I think we need to go do this over again next week – which si
extremely
> >>> unfortunate.
> >>>
> >>> Leonard
> >>>
> >>> Fro  "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com>
> >>> Date: Wednesday, February 17, 2016 at 11:46 AM
> >>> To: "DPUB mailing list (public-digipub-ig@w3.org)"
> >>> <public-digipub-ig@w3.org>
> >>> Subject: [dpub-loc] 20160217 minutes
> >>> Resent-From: <public-digipub-ig@w3.org>
> >>> Resent-Date: Wednesday, February 17, 2016 at 11:48 AM
> >>>
> >>> Minutes from today’s meeting:
> >>> https://www.w3.org/2016/02/17-dpub-loc-minutes.html
> >>>
> >>> Tzviya Siegman
> >>> Digital Book Standards & Capabilities Lead
> >>> Wiley
> >>> 201-748-6884
> >>> tsiegman@wiley.com
> >>>
> >>
> >>
> >>
> >>
> >>
> >> ----
> >> Ivan Herman, W3C
> >> Digital Publishing Lead
> >> Home: http://www.w3.org/People/Ivan/
> >> mobile: +31-641044153
> >> ORCID ID: http://orcid.org/0000-0003-0782-2704
> >>
> >>
> >>
> >>
Received on Thursday, 18 February 2016 16:46:59 UTC