Re: [dpub-loc] 20160217 minutes from Leonard Rosenthol on 2016-02-21 (public-digipub-ig@w3.org from February 2016)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Sun, 21 Feb 2016 15:00:32 +0000
To: Ivan Herman <ivan@w3.org>, Ben De Meester <ben.demeester@ugent.be>
CC: Romain <rdeltour@gmail.com>, Daniel Weck <daniel.weck@gmail.com>, "W3C Digital Publishing IG" <public-digipub-ig@w3.org>
Message-ID: <414771DF-5B96-4F37-A925-CBD1932CABC4@adobe.com>
Ivan – what I thought we had agreed on is that there are two types of POSSIBLE PWP Processors – Server and Client.  Any specific implementation of a PWP can consist of various combinations of the two.

It is just as acceptable to have a smart server/dumb client configuration as it is to have a dumb server/smart client.

As such, I support Ben’s approach of creating the separation of concepts.  And having some POSSIBLE implementation options of how that two might work together seems like a good idea.  But we need to keep in mind that we are not mandating/proscribing specific implementation – just one set of possible ones.

Leonard

From: Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>>
Date: Sunday, February 21, 2016 at 4:30 AM
To: Ben De Meester <ben.demeester@ugent.be<mailto:ben.demeester@ugent.be>>
Cc: Romain <rdeltour@gmail.com<mailto:rdeltour@gmail.com>>, Daniel Weck <daniel.weck@gmail.com<mailto:daniel.weck@gmail.com>>, Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>>, W3C Digital Publishing IG <public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>>
Subject: Re: [dpub-loc] 20160217 minutes

Hi Ben,

thanks for this but… I am not 100% sure this helps.

First of all, I would prefer not to refer to a Server PWP  Processor. This suggests that there is a need for a very specific Server to be used with PWP, which is something we should avoid. There *may* be, for convenience, ways to set up a server with standard configuration facilities in, say, Apache, but they do not constitute a 'processor' and they are by no means required.

What we will have to specify in more details is, actually, the Client side PWP processor. For that purpose, if we want to give some visual representation (which I believe is a good idea), a standard flow chart seems to be a much better approach. If I find some time, I will try to come up with one (but you can of course try to beat me into it:-)

Cheers

Ivan



On 18 Feb 2016, at 18:08, Ben De Meester <ben.demeester@ugent.be<mailto:ben.demeester@ugent.be>> wrote:

Hi all,

I think we are actually all in massive agreement, but it's just a matter of having a minimal conforming system vs enhancements.
In http://w3c.github.io/dpub-pwp-loc/drafts/minimal-server.seq.violet.html, I tried to draw a flow chart what would happen if we would have the most minimally configured server (i.e., a file server).
In http://w3c.github.io/dpub-pwp-loc/drafts/conneg.seq.violet.html, I tried to show what would happen if the server allowed conneg: there would be one request less to the server, so it would be more efficient, but the first example does not exclude the other or vice versa.
Other improvements are possible as well, an entire spectrum of complex client vs complex server can be researched.

Also, I added on the figures the definition that M is a combination of Mmanifest and Mlinkset.

It would be great if we could agree on something like http://w3c.github.io/dpub-pwp-loc/drafts/minimal-server.seq.violet.html as a baseline (and of course specify the details better), and allow for (and describe) improvements where possible.

Does this look like a good way to move forward?

Greetings,
Ben

Ben De Meester
Researcher Semantic Web
Ghent University - iMinds - Data Science Lab | Faculty of Engineering and Architecture | Department of Electronics and Information Systems
Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium
t: +32 9 331 49 59 | e: ben.demeester@ugent.be<mailto:ben.demeester@ugent.be> | URL:  http://users.ugent.be/~bjdmeest/


2016-02-18 17:59 GMT+01:00 Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>>:

> On 18 Feb 2016, at 16:40, Romain <rdeltour@gmail.com<mailto:rdeltour@gmail.com>> wrote:
>
>
>> On 18 Feb 2016, at 15:34, Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>> wrote:
>>
>> Daniel,
>>
>> to be honest, I am not sure what you are arguing for or against…
>>
>> - The fact that the unpacked and packed versions would/should reflect, conceptually, the same file hierarchy: I do not have any problem with that. Although we could imagine having some sort of a 'mapping table' in the PWP manifest to convert among URLs from one state or the other, I do not think that is really all that useful. However, I do not think anything in the current writeups contradicts this; in fact, I believe this issue is pretty much orthogonal on the choice of the Lu, L, Lp, and the relationships among them.
>
> Right.
>
>>
>> - I did not say that 'content negotiation is the lowest common denominator'. It is one of the possible approaches. I happen to think it is useful and good to have it, others have a different view; that is fine. The only thing in the text is: "The answer to HTTP Get http://book.org/published-books/1 must make M available to the PWP Processor".
>
> I think we have a consensus on this statement, which is a good start :)
>
> Also, I don't think that Lp and Lu are part of M (correct?), so do we agree about extending the statement to :
>
>  "The answer to HTTP Get http://book.org/published-books/1 must make M, Lp, and Lu available to the PWP Processor".

Essentially yes, although my formulation would be slightly different. This was a detail that Leonard and I discussed; the way I would prefer to formulate is in[1], essentially saying that M is a conceptual entity that does include the L-s and the PWP processor combines the various sources of information to glean everything it contains (including the Lp and Lu values). Ie, in practice, the processor may receive part of the information from the manifest file in the packaged version, and some through the LINK header.

I have not yet changed the text accordingly.

[1] https://lists.w3.org/Archives/Public/public-digipub-ig/2016Feb/0093.html



>
>
>> The way to honour that commitment may include several approaches which, if we were writing a standard, would be the only normative statements and are listed (for the time being, there may be more) in the four bullet items as alternatives:
>>
>>      • M itself (e.g., a JSON file, and RDFa+HTML file, etc., whatever is specified for the exact format and media type of M at some point); or
>>      • a package in some predefined PWP format that must include M; or
>>      • an HTML, SVG, or other resource, representing, e.g., the cover page of the publication, with M referred to in the Link header of the HTTP Response; or
>>      • an (X)HTML file containing the <link> element referring to M
>
> OK.
>
>>
>> Nothing here prescribes a specific server setup. Again, in standard specification parlance, all the various server setup possibilities are informative and not normative.
>
> I'm not sure I agree. IMO the mere consensual statement above (whilst important) is not enough; at some point we'll need to be more precise than that.
> Well, this depends on the scope/objectives of the TF…

But I certainly believe that we should not (even if we are normative) require one and only one possible server setup. I would _not_ require to use content negotiation as the only mechanism, but I would equally _not_ require a mechanism that makes content negotiation impossible or unused. There should be several scenarios the server maintainers could choose from. Whether such a list should be standard, whether such list should be exhaustive; I do not know. My gut feeling is neither… Because we do not produce anything normative, that is actually for later anyway.

Ivan

>
> Romain.
>
>>
>> Ivan
>>
>> P.S. I am also not fully sure what you want to show with the github example, I must admit. But it seems to reflect a particular github (server:-) setup. Let me give another example: you can run the following curl-s:
>>
>> curl --head http://www.w3.org/ns/oa

>> curl --head --header "Accept: application/ld+json" http://www.w3.org/ns/oa

>> curl --head --header "Accept: text/turtle" http://www.w3.org/ns/oa

>>
>> these will return the same conceptual content (a vocabulary) in HTML (with the vocabulary in RDFa), in JSON-LD, or in turtle, using the same canonical URL for the vocabulary itself. This requires a different server setup.
>>
>>
>>
>>
>>> On 18 Feb 2016, at 14:04, Daniel Weck <daniel.weck@gmail.com<mailto:daniel.weck@gmail.com>> wrote:
>>>
>>> Hello,
>>>
>>> here's a concrete example (unrelated to PWP) which I think illustrates
>>> the comments made during the concall, regarding content negotiation
>>> vs. dereferencing URL endpoints to "meta" data about the publication
>>> locators for unpacked / packed states.
>>>
>>> Let's consider the GitHub HTTP API, the w3c/dpub-pwp-loc GitHub
>>> repository, and the README.md file located at the root of the
>>> gh-branch. There's a "canonical" URL for that (you can safely click on
>>> the links below):
>>>
>>> curl --head https://api.github.com/repos/w3c/dpub-pwp-loc/readme

>>> ==> Content-Type: application/json; charset=utf-8
>>>
>>> curl https://api.github.com/repos/w3c/dpub-pwp-loc/readme

>>> ==> "url": "https://api.github.com/repos/w3c/dpub-pwp-loc/contents/README.md?ref=gh-pages"
>>>
>>> As a consumer of that JSON-based API, I can query the actual payload
>>> that I'm interested in:
>>> curl https://api.github.com/repos/w3c/dpub-pwp-loc/contents/README.md?ref=gh-pages

>>> ==> "content": "BASE64"
>>>
>>>
>>> Now, back to PWP:
>>>
>>> State-agnostic "canonical" URL:
>>> https://domain.com/path/to/book1

>>> (note that this could also be a totally different syntax, e.g.
>>> https://domain.com/info/?get=book1 or
>>> https://domain.com/book1?get=info etc. for as long as a request
>>> returns a content-type that a PWP processor / reading-system can
>>> consume, e.g. application/json or application/pwp-info+json ... or XML
>>> / whatever)
>>> A simple request to this URL could return (minimal JSON example, just
>>> for illustration purposes):
>>> {
>>>  "packed": "https://domain.com/path/to/book1.pwp",
>>>  "unpacked":
>>> "https://domain.com/another/path/to/book1/manifest.json"  /// (or
>>> container.xml, or package.opf ... :)
>>> }
>>>
>>> Once again, there is no naming convention / constraint on the "packed"
>>> URL https://domain.com/path/to/book1.pwp which could be
>>> https://domain.com/download/book1 or
>>> https://download.domain.com/?get=book1 , as long as a request returns
>>> a payload with content-type application/pwp+zip (for example). Note
>>> that the book1.pwp archive in my example would contain the "main entry
>>> point" manifest.json (thus why I made a parallel above with EPUB
>>> container.xml or package.opf)
>>>
>>> The "unpacked" URL path
>>> https://domain.com/another/path/to/book1/manifest.json does not have
>>> to represent the actual file structure on the server, but it's a
>>> useful syntactical convention because other resource files in the PWP
>>> would probably have similarly-rooted relative locator paths (against a
>>> given base href), e.g.:
>>> https://domain.com/another/path/to/book1/index.html

>>> https://domain.com/another/path/to/book1/images/logo.png

>>> In other words, if the packed book1.pwp contains index.html with <img
>>> src="./images/logo.png" />, it does make sense for the online unpacked
>>> state to use the same path references (as per the example URLs above).
>>> Publishers may have the option to route URLs any way they like, e.g.
>>> <img src="?get_image=logo.png" />, but we know there is the issue of
>>> mapping document URLs in packed/unpacked states with some canonical
>>> locator, so that annotation targets can be referenced and resolved
>>> consistently. So it would greatly help if the file structure inside
>>> the packed book1.pwp was replicated exactly in the URL patterns used
>>> for deploying the unpacked state.
>>>
>>> To conclude, I am probably missing something (Ivan and Leonard, you
>>> guys are ahead of the curve compared to me), but I hope I managed to
>>> convey useful arguments. Personally, as a developer involved in
>>> reading-system implementations, and as someone who would like to
>>> continue deploying content with minimal server-side requirements, I am
>>> not yet convinced that content negotiation is needed here. As an
>>> optional feature, sure, but not as the lowest common denominator.
>>>
>>> Thanks for listening :)
>>> Regards, Dan
>>>
>>>
>>>
>>> On Thu, Feb 18, 2016 at 12:04 PM, Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>> wrote:
>>>> With the caveat that the minutes are always difficult to read (Romain, that
>>>> is not your fault, it is the case for most of the minutes; I know only a few
>>>> people who write perfect minutes, and I am certainly not among them) maybe
>>>> some comments on my side. More about this next time we can all talk
>>>> (although it seems that this will only be in two weeks, due to the Baltimore
>>>> EDUPUB meeting).
>>>>
>>>> First of all, this comment:
>>>>
>>>> [[[
>>>> rom: my issue is that the spec doesn't say "if Lu exists then L must be Lu",
>>>> I think we should consider it
>>>> ]]]
>>>>
>>>> I do not see why we should say anything like that. It is of course correct
>>>> that, in many cases, it makes a lot of sense to have Lu=L. But I do not see
>>>> why we should restrict it this way. In general, the approach I tried to
>>>> follow in my writeup is to be as permissive as possible and put the minimum
>>>> possible hard requirements on the locator setup. It is probably worth adding
>>>> a note in the text (or the more final text) that Lu may be equal to L (in
>>>> fact, this may very well be a widely used approach) but I would not want to
>>>> go beyond that.
>>>>
>>>> Then there is the whole issue about content negotiations… It seems that we
>>>> have a disagreement on the value and usage of content negotiations. I do not
>>>> agree with Daniel's statement that "in a RESTful API the URL would
>>>> consistently return the same content type". It is certainly not the
>>>> practice, nor should it be. Content negotiation is widely used when tools
>>>> want to retrieve, for example the best syntax that encodes a particular
>>>> information (typical example is in RDF land, where tools may or may not have
>>>> parsers for a particular RDF serialization), this is how dbpedia is set up
>>>> etc. (I did told you about the way RDF namespace documents are set up on our
>>>> site, for example. It is pretty much general practice to do that.) I must
>>>> admit I also do not agree with Daniel's remark on "content negotiation based
>>>> on (sophisticated) HTTP headers sounds counter intuitive". Content
>>>> negotiations is certainly very intuitive to me...
>>>>
>>>> All that being said, and that is where maybe there is actually a minor
>>>> disagreement between Leonard and I: I do not say that content negotiation is
>>>> the only approach to set up a server storage. The text I wrote is
>>>> deliberately open ended insofar as it described what the client expectation
>>>> is when that GET request is issued in general terms, and the choice among
>>>> the various alternatives are all the server's. The list of possible server
>>>> behaviours in the text are possible alternatives, instead of hard
>>>> requirements. The client is responsible in following the various possible
>>>> paths and, maybe, we will have to describe those possibilities later in more
>>>> details (precise usage of the LINK header, the <link> element, media types,
>>>> etc), but that gives the liberty to set up the server the way the publisher
>>>> wants. If we accept this approach, ie, that the client has some complexity
>>>> to resolve in favour of a variety of possible server setups, then I do not
>>>> think there is a major disagreement among us.
>>>>
>>>> Talk to you guys later…
>>>>
>>>> Ivan
>>>>
>>>> B.t.w., a more general and slightly philosophical comment: we should not be
>>>> afraid of really using HTTP:-) The various header information in both the
>>>> request and response headers of an HTTP request/response are very rich and
>>>> sophisticated. There are many situations, on expiration dates, on security,
>>>> etc, and of course content negotiations that can be expressed via these HTTP
>>>> headers, and we should not shy away using those whenever we can and it makes
>>>> sense. As I showed in one of may mails it is not that complex to set up
>>>> (actually, and to be fair, setting up content negotiations is probably the
>>>> more complex thing, I accept that).
>>>>
>>>> If you are interested by the various possibilities, this site may be of
>>>> interest:
>>>>
>>>> https://github.com/dret/sedola/blob/master/MD/headers.md

>>>>
>>>>
>>>>
>>>> On 18 Feb 2016, at 09:24, Romain <rdeltour@gmail.com<mailto:rdeltour@gmail.com>> wrote:
>>>>
>>>>
>>>> On 18 Feb 2016, at 02:49, Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
>>>>
>>>> Actually, the big issue that I took away from the minutes is that ivan and I
>>>> are in agreement that content negotiation (via standard web technique incl.
>>>> the Accept header) is the proper way for the client & server to decide what
>>>> to return on the GET from the canonical locator.   Daniel, however, appears
>>>> (from the minutes) to be promoting a completely different approach.
>>>>
>>>>
>>>> As stated before [1], I am absolutely not convinced that content negotiation
>>>> is a good approach.
>>>> I want to upload a PWP tomorrow to a static file hosting service; if conneg
>>>> is required I can't do that.
>>>>
>>>> More to the point: how to you GET the (manifest + Lu + Lp) info with the
>>>> conneg solution? Maybe I just miss something.
>>>>
>>>> Finally, may I turn the question the other way around: what are the benefits
>>>> of content negotiation for the canonical locator? (compared to an
>>>> alternative approach with explicit links in the GET answer (headers or
>>>> payload).
>>>>
>>>> Thanks,
>>>> Romain.
>>>>
>>>> [1] https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0136.html

>>>>
>>>>
>>>> Daniel, if you can explain why you want to do something different from the
>>>> standard web/REST model, I’d like to understand.
>>>>
>>>> Leonard
>>>>
>>>> From: Romain <rdeltour@gmail.com<mailto:rdeltour@gmail.com>>
>>>> Date: Wednesday, February 17, 2016 at 6:26 PM
>>>> To: Daniel Weck <daniel.weck@gmail.com<mailto:daniel.weck@gmail.com>>, Leonard Rosenthol
>>>> <lrosenth@adobe.com<mailto:lrosenth@adobe.com>>
>>>> Cc: "DPUB mailing list (public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>)"
>>>> <public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>>, Tzviya Siegman <tsiegman@wiley.com<mailto:tsiegman@wiley.com>>
>>>> Subject: Re: [dpub-loc] 20160217 minutes
>>>>
>>>> On 17 Feb 2016, at 23:12, Daniel Weck <daniel.weck@gmail.com<mailto:daniel.weck@gmail.com>> wrote:
>>>>
>>>> Hi Leonard, that's quite a bold statement, but I suspect the minutes could
>>>> do with a few corrections.
>>>>
>>>> My bad if the minutes are inaccurate, please feel free to amend. It was a
>>>> bit frustrating too: several times I wanted to talk or precise a point but
>>>> was busy typing.
>>>>
>>>> At any rate, I look forward to the recap from you and Ivan at the next
>>>> opportunity. PS: it was a small quorum on this concall, but I was under the
>>>> impression that the participants agreed on the broad lines of your proposal,
>>>> with only details to clarify.
>>>>
>>>> My impression is that participants generally agreed with the presentation of
>>>> the issues and some principles. I believe that the main point that is still
>>>> controversial is really what should be the answer to a GET on the canonical
>>>> locator.
>>>>
>>>>> I think we need to go do this over again next week – which si extremely
>>>>> unfortunate.
>>>>
>>>>
>>>> If I'm not mistaken Matt, Markus, Tzviya and I won't be able to attend
>>>> (EDUPUB summit).
>>>>
>>>> Romain.
>>>>
>>>> Regards, Daniel
>>>>
>>>> On 17 Feb 2016 9:17 p.m., "Leonard Rosenthol" <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
>>>>>
>>>>> Sorry that I was unable to attend today, especially since the discussion
>>>>> (based on the minutes) seems to completely undo all the work that Ivan,
>>>>> myself and others did on the mailing list during the past week.   The
>>>>> position presented by Daniel is the exact opposite of what Ivan’s musings
>>>>> (adjusted based on mail conversations) presented.
>>>>>
>>>>> I think we need to go do this over again next week – which si extremely
>>>>> unfortunate.
>>>>>
>>>>> Leonard
>>>>>
>>>>> Fro  "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com<mailto:tsiegman@wiley.com>>
>>>>> Date: Wednesday, February 17, 2016 at 11:46 AM
>>>>> To: "DPUB mailing list (public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>)"
>>>>> <public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>>
>>>>> Subject: [dpub-loc] 20160217 minutes
>>>>> Resent-From: <public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>>
>>>>> Resent-Date: Wednesday, February 17, 2016 at 11:48 AM
>>>>>
>>>>> Minutes from today’s meeting:
>>>>> https://www.w3.org/2016/02/17-dpub-loc-minutes.html

>>>>>
>>>>> Tzviya Siegman
>>>>> Digital Book Standards & Capabilities Lead
>>>>> Wiley
>>>>> 201-748-6884<tel:201-748-6884>
>>>>> tsiegman@wiley.com<mailto:tsiegman@wiley.com>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ----
>>>> Ivan Herman, W3C
>>>> Digital Publishing Lead
>>>> Home: http://www.w3.org/People/Ivan/

>>>> mobile: +31-641044153<tel:%2B31-641044153>
>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704

>>>>
>>>>
>>>>
>>>>
>>
>>
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Lead
>> Home: http://www.w3.org/People/Ivan/

>> mobile: +31-641044153<tel:%2B31-641044153>
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>
>>
>>
>>
>


----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153<tel:%2B31-641044153>
ORCID ID: http://orcid.org/0000-0003-0782-2704








----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Sunday, 21 February 2016 15:01:07 UTC