Re: [dpub-loc] 20160217 minutes from Romain on 2016-02-18 (public-digipub-ig@w3.org from February 2016)

From: Romain <rdeltour@gmail.com>
Date: Thu, 18 Feb 2016 16:40:19 +0100
To: Ivan Herman <ivan@w3.org>
Cc: Daniel Weck <daniel.weck@gmail.com>, Leonard Rosenthol <lrosenth@adobe.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-Id: <1448CB2B-2739-4498-8A31-7B120EE8397B@gmail.com>
> On 18 Feb 2016, at 15:34, Ivan Herman <ivan@w3.org> wrote:
> 
> Daniel,
> 
> to be honest, I am not sure what you are arguing for or against…
> 
> - The fact that the unpacked and packed versions would/should reflect, conceptually, the same file hierarchy: I do not have any problem with that. Although we could imagine having some sort of a 'mapping table' in the PWP manifest to convert among URLs from one state or the other, I do not think that is really all that useful. However, I do not think anything in the current writeups contradicts this; in fact, I believe this issue is pretty much orthogonal on the choice of the Lu, L, Lp, and the relationships among them.

Right.

> 
> - I did not say that 'content negotiation is the lowest common denominator'. It is one of the possible approaches. I happen to think it is useful and good to have it, others have a different view; that is fine. The only thing in the text is: "The answer to HTTP Get http://book.org/published-books/1 must make M available to the PWP Processor".

I think we have a consensus on this statement, which is a good start :)

Also, I don't think that Lp and Lu are part of M (correct?), so do we agree about extending the statement to :

  "The answer to HTTP Get http://book.org/published-books/1 must make M, Lp, and Lu available to the PWP Processor".


> The way to honour that commitment may include several approaches which, if we were writing a standard, would be the only normative statements and are listed (for the time being, there may be more) in the four bullet items as alternatives:
> 
>  • M itself (e.g., a JSON file, and RDFa+HTML file, etc., whatever is specified for the exact format and media type of M at some point); or
>  • a package in some predefined PWP format that must include M; or
>  • an HTML, SVG, or other resource, representing, e.g., the cover page of the publication, with M referred to in the Link header of the HTTP Response; or
>  • an (X)HTML file containing the <link> element referring to M

OK.

> 
> Nothing here prescribes a specific server setup. Again, in standard specification parlance, all the various server setup possibilities are informative and not normative.

I'm not sure I agree. IMO the mere consensual statement above (whilst important) is not enough; at some point we'll need to be more precise than that.
Well, this depends on the scope/objectives of the TF...

Romain.

> 
> Ivan
> 
> P.S. I am also not fully sure what you want to show with the github example, I must admit. But it seems to reflect a particular github (server:-) setup. Let me give another example: you can run the following curl-s:
> 
> curl --head http://www.w3.org/ns/oa
> curl --head --header "Accept: application/ld+json" http://www.w3.org/ns/oa
> curl --head --header "Accept: text/turtle" http://www.w3.org/ns/oa
> 
> these will return the same conceptual content (a vocabulary) in HTML (with the vocabulary in RDFa), in JSON-LD, or in turtle, using the same canonical URL for the vocabulary itself. This requires a different server setup.
> 
> 
> 
> 
>> On 18 Feb 2016, at 14:04, Daniel Weck <daniel.weck@gmail.com> wrote:
>> 
>> Hello,
>> 
>> here's a concrete example (unrelated to PWP) which I think illustrates
>> the comments made during the concall, regarding content negotiation
>> vs. dereferencing URL endpoints to "meta" data about the publication
>> locators for unpacked / packed states.
>> 
>> Let's consider the GitHub HTTP API, the w3c/dpub-pwp-loc GitHub
>> repository, and the README.md file located at the root of the
>> gh-branch. There's a "canonical" URL for that (you can safely click on
>> the links below):
>> 
>> curl --head https://api.github.com/repos/w3c/dpub-pwp-loc/readme
>> ==> Content-Type: application/json; charset=utf-8
>> 
>> curl https://api.github.com/repos/w3c/dpub-pwp-loc/readme
>> ==> "url": "https://api.github.com/repos/w3c/dpub-pwp-loc/contents/README.md?ref=gh-pages"
>> 
>> As a consumer of that JSON-based API, I can query the actual payload
>> that I'm interested in:
>> curl https://api.github.com/repos/w3c/dpub-pwp-loc/contents/README.md?ref=gh-pages
>> ==> "content": "BASE64"
>> 
>> 
>> Now, back to PWP:
>> 
>> State-agnostic "canonical" URL:
>> https://domain.com/path/to/book1
>> (note that this could also be a totally different syntax, e.g.
>> https://domain.com/info/?get=book1 or
>> https://domain.com/book1?get=info etc. for as long as a request
>> returns a content-type that a PWP processor / reading-system can
>> consume, e.g. application/json or application/pwp-info+json ... or XML
>> / whatever)
>> A simple request to this URL could return (minimal JSON example, just
>> for illustration purposes):
>> {
>>   "packed": "https://domain.com/path/to/book1.pwp",
>>   "unpacked":
>> "https://domain.com/another/path/to/book1/manifest.json"  /// (or
>> container.xml, or package.opf ... :)
>> }
>> 
>> Once again, there is no naming convention / constraint on the "packed"
>> URL https://domain.com/path/to/book1.pwp which could be
>> https://domain.com/download/book1 or
>> https://download.domain.com/?get=book1 , as long as a request returns
>> a payload with content-type application/pwp+zip (for example). Note
>> that the book1.pwp archive in my example would contain the "main entry
>> point" manifest.json (thus why I made a parallel above with EPUB
>> container.xml or package.opf)
>> 
>> The "unpacked" URL path
>> https://domain.com/another/path/to/book1/manifest.json does not have
>> to represent the actual file structure on the server, but it's a
>> useful syntactical convention because other resource files in the PWP
>> would probably have similarly-rooted relative locator paths (against a
>> given base href), e.g.:
>> https://domain.com/another/path/to/book1/index.html
>> https://domain.com/another/path/to/book1/images/logo.png
>> In other words, if the packed book1.pwp contains index.html with <img
>> src="./images/logo.png" />, it does make sense for the online unpacked
>> state to use the same path references (as per the example URLs above).
>> Publishers may have the option to route URLs any way they like, e.g.
>> <img src="?get_image=logo.png" />, but we know there is the issue of
>> mapping document URLs in packed/unpacked states with some canonical
>> locator, so that annotation targets can be referenced and resolved
>> consistently. So it would greatly help if the file structure inside
>> the packed book1.pwp was replicated exactly in the URL patterns used
>> for deploying the unpacked state.
>> 
>> To conclude, I am probably missing something (Ivan and Leonard, you
>> guys are ahead of the curve compared to me), but I hope I managed to
>> convey useful arguments. Personally, as a developer involved in
>> reading-system implementations, and as someone who would like to
>> continue deploying content with minimal server-side requirements, I am
>> not yet convinced that content negotiation is needed here. As an
>> optional feature, sure, but not as the lowest common denominator.
>> 
>> Thanks for listening :)
>> Regards, Dan
>> 
>> 
>> 
>> On Thu, Feb 18, 2016 at 12:04 PM, Ivan Herman <ivan@w3.org> wrote:
>>> With the caveat that the minutes are always difficult to read (Romain, that
>>> is not your fault, it is the case for most of the minutes; I know only a few
>>> people who write perfect minutes, and I am certainly not among them) maybe
>>> some comments on my side. More about this next time we can all talk
>>> (although it seems that this will only be in two weeks, due to the Baltimore
>>> EDUPUB meeting).
>>> 
>>> First of all, this comment:
>>> 
>>> [[[
>>> rom: my issue is that the spec doesn't say "if Lu exists then L must be Lu",
>>> I think we should consider it
>>> ]]]
>>> 
>>> I do not see why we should say anything like that. It is of course correct
>>> that, in many cases, it makes a lot of sense to have Lu=L. But I do not see
>>> why we should restrict it this way. In general, the approach I tried to
>>> follow in my writeup is to be as permissive as possible and put the minimum
>>> possible hard requirements on the locator setup. It is probably worth adding
>>> a note in the text (or the more final text) that Lu may be equal to L (in
>>> fact, this may very well be a widely used approach) but I would not want to
>>> go beyond that.
>>> 
>>> Then there is the whole issue about content negotiations… It seems that we
>>> have a disagreement on the value and usage of content negotiations. I do not
>>> agree with Daniel's statement that "in a RESTful API the URL would
>>> consistently return the same content type". It is certainly not the
>>> practice, nor should it be. Content negotiation is widely used when tools
>>> want to retrieve, for example the best syntax that encodes a particular
>>> information (typical example is in RDF land, where tools may or may not have
>>> parsers for a particular RDF serialization), this is how dbpedia is set up
>>> etc. (I did told you about the way RDF namespace documents are set up on our
>>> site, for example. It is pretty much general practice to do that.) I must
>>> admit I also do not agree with Daniel's remark on "content negotiation based
>>> on (sophisticated) HTTP headers sounds counter intuitive". Content
>>> negotiations is certainly very intuitive to me...
>>> 
>>> All that being said, and that is where maybe there is actually a minor
>>> disagreement between Leonard and I: I do not say that content negotiation is
>>> the only approach to set up a server storage. The text I wrote is
>>> deliberately open ended insofar as it described what the client expectation
>>> is when that GET request is issued in general terms, and the choice among
>>> the various alternatives are all the server's. The list of possible server
>>> behaviours in the text are possible alternatives, instead of hard
>>> requirements. The client is responsible in following the various possible
>>> paths and, maybe, we will have to describe those possibilities later in more
>>> details (precise usage of the LINK header, the <link> element, media types,
>>> etc), but that gives the liberty to set up the server the way the publisher
>>> wants. If we accept this approach, ie, that the client has some complexity
>>> to resolve in favour of a variety of possible server setups, then I do not
>>> think there is a major disagreement among us.
>>> 
>>> Talk to you guys later…
>>> 
>>> Ivan
>>> 
>>> B.t.w., a more general and slightly philosophical comment: we should not be
>>> afraid of really using HTTP:-) The various header information in both the
>>> request and response headers of an HTTP request/response are very rich and
>>> sophisticated. There are many situations, on expiration dates, on security,
>>> etc, and of course content negotiations that can be expressed via these HTTP
>>> headers, and we should not shy away using those whenever we can and it makes
>>> sense. As I showed in one of may mails it is not that complex to set up
>>> (actually, and to be fair, setting up content negotiations is probably the
>>> more complex thing, I accept that).
>>> 
>>> If you are interested by the various possibilities, this site may be of
>>> interest:
>>> 
>>> https://github.com/dret/sedola/blob/master/MD/headers.md
>>> 
>>> 
>>> 
>>> On 18 Feb 2016, at 09:24, Romain <rdeltour@gmail.com> wrote:
>>> 
>>> 
>>> On 18 Feb 2016, at 02:49, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>>> 
>>> Actually, the big issue that I took away from the minutes is that ivan and I
>>> are in agreement that content negotiation (via standard web technique incl.
>>> the Accept header) is the proper way for the client & server to decide what
>>> to return on the GET from the canonical locator.   Daniel, however, appears
>>> (from the minutes) to be promoting a completely different approach.
>>> 
>>> 
>>> As stated before [1], I am absolutely not convinced that content negotiation
>>> is a good approach.
>>> I want to upload a PWP tomorrow to a static file hosting service; if conneg
>>> is required I can't do that.
>>> 
>>> More to the point: how to you GET the (manifest + Lu + Lp) info with the
>>> conneg solution? Maybe I just miss something.
>>> 
>>> Finally, may I turn the question the other way around: what are the benefits
>>> of content negotiation for the canonical locator? (compared to an
>>> alternative approach with explicit links in the GET answer (headers or
>>> payload).
>>> 
>>> Thanks,
>>> Romain.
>>> 
>>> [1] https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0136.html
>>> 
>>> 
>>> Daniel, if you can explain why you want to do something different from the
>>> standard web/REST model, I’d like to understand.
>>> 
>>> Leonard
>>> 
>>> From: Romain <rdeltour@gmail.com>
>>> Date: Wednesday, February 17, 2016 at 6:26 PM
>>> To: Daniel Weck <daniel.weck@gmail.com>, Leonard Rosenthol
>>> <lrosenth@adobe.com>
>>> Cc: "DPUB mailing list (public-digipub-ig@w3.org)"
>>> <public-digipub-ig@w3.org>, Tzviya Siegman <tsiegman@wiley.com>
>>> Subject: Re: [dpub-loc] 20160217 minutes
>>> 
>>> On 17 Feb 2016, at 23:12, Daniel Weck <daniel.weck@gmail.com> wrote:
>>> 
>>> Hi Leonard, that's quite a bold statement, but I suspect the minutes could
>>> do with a few corrections.
>>> 
>>> My bad if the minutes are inaccurate, please feel free to amend. It was a
>>> bit frustrating too: several times I wanted to talk or precise a point but
>>> was busy typing.
>>> 
>>> At any rate, I look forward to the recap from you and Ivan at the next
>>> opportunity. PS: it was a small quorum on this concall, but I was under the
>>> impression that the participants agreed on the broad lines of your proposal,
>>> with only details to clarify.
>>> 
>>> My impression is that participants generally agreed with the presentation of
>>> the issues and some principles. I believe that the main point that is still
>>> controversial is really what should be the answer to a GET on the canonical
>>> locator.
>>> 
>>>> I think we need to go do this over again next week – which si extremely
>>>> unfortunate.
>>> 
>>> 
>>> If I'm not mistaken Matt, Markus, Tzviya and I won't be able to attend
>>> (EDUPUB summit).
>>> 
>>> Romain.
>>> 
>>> Regards, Daniel
>>> 
>>> On 17 Feb 2016 9:17 p.m., "Leonard Rosenthol" <lrosenth@adobe.com> wrote:
>>>> 
>>>> Sorry that I was unable to attend today, especially since the discussion
>>>> (based on the minutes) seems to completely undo all the work that Ivan,
>>>> myself and others did on the mailing list during the past week.   The
>>>> position presented by Daniel is the exact opposite of what Ivan’s musings
>>>> (adjusted based on mail conversations) presented.
>>>> 
>>>> I think we need to go do this over again next week – which si extremely
>>>> unfortunate.
>>>> 
>>>> Leonard
>>>> 
>>>> Fro  "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com>
>>>> Date: Wednesday, February 17, 2016 at 11:46 AM
>>>> To: "DPUB mailing list (public-digipub-ig@w3.org)"
>>>> <public-digipub-ig@w3.org>
>>>> Subject: [dpub-loc] 20160217 minutes
>>>> Resent-From: <public-digipub-ig@w3.org>
>>>> Resent-Date: Wednesday, February 17, 2016 at 11:48 AM
>>>> 
>>>> Minutes from today’s meeting:
>>>> https://www.w3.org/2016/02/17-dpub-loc-minutes.html
>>>> 
>>>> Tzviya Siegman
>>>> Digital Book Standards & Capabilities Lead
>>>> Wiley
>>>> 201-748-6884
>>>> tsiegman@wiley.com
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>> 
>>> 
>>> 
>>> 
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
> 
> 
> 
>
Received on Thursday, 18 February 2016 15:40:52 UTC