Re: [dpub-loc] 20160217 minutes

Hi all,

I think we are actually all in massive agreement, but it's just a matter of
having a minimal conforming system vs enhancements.
In http://w3c.github.io/dpub-pwp-loc/drafts/minimal-server.seq.violet.html,
I tried to draw a flow chart what would happen if we would have the most
minimally configured server (i.e., a file server).
In http://w3c.github.io/dpub-pwp-loc/drafts/conneg.seq.violet.html, I tried
to show what would happen if the server allowed conneg: there would be one
request less to the server, so it would be more efficient, but the first
example does not exclude the other or vice versa.
Other improvements are possible as well, an entire spectrum of complex
client vs complex server can be researched.

Also, I added on the figures the definition that M is a combination of
Mmanifest and Mlinkset.

It would be great if we could agree on *something like*
http://w3c.github.io/dpub-pwp-loc/drafts/minimal-server.seq.violet.html as
a baseline (and of course specify the details better), and allow for (and
describe) improvements where possible.

Does this look like a good way to move forward?

Greetings,
Ben

Ben De Meester
Researcher Semantic Web
Ghent University - iMinds - Data Science Lab | Faculty of Engineering and
Architecture | Department of Electronics and Information Systems
Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium
t: +32 9 331 49 59 | e: ben.demeester@ugent.be | URL:
http://users.ugent.be/~bjdmeest/

2016-02-18 17:59 GMT+01:00 Ivan Herman <ivan@w3.org>:

>
> > On 18 Feb 2016, at 16:40, Romain <rdeltour@gmail.com> wrote:
> >
> >
> >> On 18 Feb 2016, at 15:34, Ivan Herman <ivan@w3.org> wrote:
> >>
> >> Daniel,
> >>
> >> to be honest, I am not sure what you are arguing for or against…
> >>
> >> - The fact that the unpacked and packed versions would/should reflect,
> conceptually, the same file hierarchy: I do not have any problem with that.
> Although we could imagine having some sort of a 'mapping table' in the PWP
> manifest to convert among URLs from one state or the other, I do not think
> that is really all that useful. However, I do not think anything in the
> current writeups contradicts this; in fact, I believe this issue is pretty
> much orthogonal on the choice of the Lu, L, Lp, and the relationships among
> them.
> >
> > Right.
> >
> >>
> >> - I did not say that 'content negotiation is the lowest common
> denominator'. It is one of the possible approaches. I happen to think it is
> useful and good to have it, others have a different view; that is fine. The
> only thing in the text is: "The answer to HTTP Get
> http://book.org/published-books/1 must make M available to the PWP
> Processor".
> >
> > I think we have a consensus on this statement, which is a good start :)
> >
> > Also, I don't think that Lp and Lu are part of M (correct?), so do we
> agree about extending the statement to :
> >
> >  "The answer to HTTP Get http://book.org/published-books/1 must make M,
> Lp, and Lu available to the PWP Processor".
>
> Essentially yes, although my formulation would be slightly different. This
> was a detail that Leonard and I discussed; the way I would prefer to
> formulate is in[1], essentially saying that M is a conceptual entity that
> does include the L-s and the PWP processor combines the various sources of
> information to glean everything it contains (including the Lp and Lu
> values). Ie, in practice, the processor may receive part of the information
> from the manifest file in the packaged version, and some through the LINK
> header.
>
> I have not yet changed the text accordingly.
>
> [1]
> https://lists.w3.org/Archives/Public/public-digipub-ig/2016Feb/0093.html
>
>
> >
> >
> >> The way to honour that commitment may include several approaches which,
> if we were writing a standard, would be the only normative statements and
> are listed (for the time being, there may be more) in the four bullet items
> as alternatives:
> >>
> >>      • M itself (e.g., a JSON file, and RDFa+HTML file, etc., whatever
> is specified for the exact format and media type of M at some point); or
> >>      • a package in some predefined PWP format that must include M; or
> >>      • an HTML, SVG, or other resource, representing, e.g., the cover
> page of the publication, with M referred to in the Link header of the HTTP
> Response; or
> >>      • an (X)HTML file containing the <link> element referring to M
> >
> > OK.
> >
> >>
> >> Nothing here prescribes a specific server setup. Again, in standard
> specification parlance, all the various server setup possibilities are
> informative and not normative.
> >
> > I'm not sure I agree. IMO the mere consensual statement above (whilst
> important) is not enough; at some point we'll need to be more precise than
> that.
> > Well, this depends on the scope/objectives of the TF…
>
> But I certainly believe that we should not (even if we are normative)
> require one and only one possible server setup. I would _not_ require to
> use content negotiation as the only mechanism, but I would equally _not_
> require a mechanism that makes content negotiation impossible or unused.
> There should be several scenarios the server maintainers could choose from.
> Whether such a list should be standard, whether such list should be
> exhaustive; I do not know. My gut feeling is neither… Because we do not
> produce anything normative, that is actually for later anyway.
>
> Ivan
>
> >
> > Romain.
> >
> >>
> >> Ivan
> >>
> >> P.S. I am also not fully sure what you want to show with the github
> example, I must admit. But it seems to reflect a particular github
> (server:-) setup. Let me give another example: you can run the following
> curl-s:
> >>
> >> curl --head http://www.w3.org/ns/oa
> >> curl --head --header "Accept: application/ld+json"
> http://www.w3.org/ns/oa
> >> curl --head --header "Accept: text/turtle" http://www.w3.org/ns/oa
> >>
> >> these will return the same conceptual content (a vocabulary) in HTML
> (with the vocabulary in RDFa), in JSON-LD, or in turtle, using the same
> canonical URL for the vocabulary itself. This requires a different server
> setup.
> >>
> >>
> >>
> >>
> >>> On 18 Feb 2016, at 14:04, Daniel Weck <daniel.weck@gmail.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> here's a concrete example (unrelated to PWP) which I think illustrates
> >>> the comments made during the concall, regarding content negotiation
> >>> vs. dereferencing URL endpoints to "meta" data about the publication
> >>> locators for unpacked / packed states.
> >>>
> >>> Let's consider the GitHub HTTP API, the w3c/dpub-pwp-loc GitHub
> >>> repository, and the README.md file located at the root of the
> >>> gh-branch. There's a "canonical" URL for that (you can safely click on
> >>> the links below):
> >>>
> >>> curl --head https://api.github.com/repos/w3c/dpub-pwp-loc/readme
> >>> ==> Content-Type: application/json; charset=utf-8
> >>>
> >>> curl https://api.github.com/repos/w3c/dpub-pwp-loc/readme
> >>> ==> "url": "
> https://api.github.com/repos/w3c/dpub-pwp-loc/contents/README.md?ref=gh-pages
> "
> >>>
> >>> As a consumer of that JSON-based API, I can query the actual payload
> >>> that I'm interested in:
> >>> curl
> https://api.github.com/repos/w3c/dpub-pwp-loc/contents/README.md?ref=gh-pages
> >>> ==> "content": "BASE64"
> >>>
> >>>
> >>> Now, back to PWP:
> >>>
> >>> State-agnostic "canonical" URL:
> >>> https://domain.com/path/to/book1
> >>> (note that this could also be a totally different syntax, e.g.
> >>> https://domain.com/info/?get=book1 or
> >>> https://domain.com/book1?get=info etc. for as long as a request
> >>> returns a content-type that a PWP processor / reading-system can
> >>> consume, e.g. application/json or application/pwp-info+json ... or XML
> >>> / whatever)
> >>> A simple request to this URL could return (minimal JSON example, just
> >>> for illustration purposes):
> >>> {
> >>>  "packed": "https://domain.com/path/to/book1.pwp",
> >>>  "unpacked":
> >>> "https://domain.com/another/path/to/book1/manifest.json"  /// (or
> >>> container.xml, or package.opf ... :)
> >>> }
> >>>
> >>> Once again, there is no naming convention / constraint on the "packed"
> >>> URL https://domain.com/path/to/book1.pwp which could be
> >>> https://domain.com/download/book1 or
> >>> https://download.domain.com/?get=book1 , as long as a request returns
> >>> a payload with content-type application/pwp+zip (for example). Note
> >>> that the book1.pwp archive in my example would contain the "main entry
> >>> point" manifest.json (thus why I made a parallel above with EPUB
> >>> container.xml or package.opf)
> >>>
> >>> The "unpacked" URL path
> >>> https://domain.com/another/path/to/book1/manifest.json does not have
> >>> to represent the actual file structure on the server, but it's a
> >>> useful syntactical convention because other resource files in the PWP
> >>> would probably have similarly-rooted relative locator paths (against a
> >>> given base href), e.g.:
> >>> https://domain.com/another/path/to/book1/index.html
> >>> https://domain.com/another/path/to/book1/images/logo.png
> >>> In other words, if the packed book1.pwp contains index.html with <img
> >>> src="./images/logo.png" />, it does make sense for the online unpacked
> >>> state to use the same path references (as per the example URLs above).
> >>> Publishers may have the option to route URLs any way they like, e.g.
> >>> <img src="?get_image=logo.png" />, but we know there is the issue of
> >>> mapping document URLs in packed/unpacked states with some canonical
> >>> locator, so that annotation targets can be referenced and resolved
> >>> consistently. So it would greatly help if the file structure inside
> >>> the packed book1.pwp was replicated exactly in the URL patterns used
> >>> for deploying the unpacked state.
> >>>
> >>> To conclude, I am probably missing something (Ivan and Leonard, you
> >>> guys are ahead of the curve compared to me), but I hope I managed to
> >>> convey useful arguments. Personally, as a developer involved in
> >>> reading-system implementations, and as someone who would like to
> >>> continue deploying content with minimal server-side requirements, I am
> >>> not yet convinced that content negotiation is needed here. As an
> >>> optional feature, sure, but not as the lowest common denominator.
> >>>
> >>> Thanks for listening :)
> >>> Regards, Dan
> >>>
> >>>
> >>>
> >>> On Thu, Feb 18, 2016 at 12:04 PM, Ivan Herman <ivan@w3.org> wrote:
> >>>> With the caveat that the minutes are always difficult to read
> (Romain, that
> >>>> is not your fault, it is the case for most of the minutes; I know
> only a few
> >>>> people who write perfect minutes, and I am certainly not among them)
> maybe
> >>>> some comments on my side. More about this next time we can all talk
> >>>> (although it seems that this will only be in two weeks, due to the
> Baltimore
> >>>> EDUPUB meeting).
> >>>>
> >>>> First of all, this comment:
> >>>>
> >>>> [[[
> >>>> rom: my issue is that the spec doesn't say "if Lu exists then L must
> be Lu",
> >>>> I think we should consider it
> >>>> ]]]
> >>>>
> >>>> I do not see why we should say anything like that. It is of course
> correct
> >>>> that, in many cases, it makes a lot of sense to have Lu=L. But I do
> not see
> >>>> why we should restrict it this way. In general, the approach I tried
> to
> >>>> follow in my writeup is to be as permissive as possible and put the
> minimum
> >>>> possible hard requirements on the locator setup. It is probably worth
> adding
> >>>> a note in the text (or the more final text) that Lu may be equal to L
> (in
> >>>> fact, this may very well be a widely used approach) but I would not
> want to
> >>>> go beyond that.
> >>>>
> >>>> Then there is the whole issue about content negotiations… It seems
> that we
> >>>> have a disagreement on the value and usage of content negotiations. I
> do not
> >>>> agree with Daniel's statement that "in a RESTful API the URL would
> >>>> consistently return the same content type". It is certainly not the
> >>>> practice, nor should it be. Content negotiation is widely used when
> tools
> >>>> want to retrieve, for example the best syntax that encodes a
> particular
> >>>> information (typical example is in RDF land, where tools may or may
> not have
> >>>> parsers for a particular RDF serialization), this is how dbpedia is
> set up
> >>>> etc. (I did told you about the way RDF namespace documents are set up
> on our
> >>>> site, for example. It is pretty much general practice to do that.) I
> must
> >>>> admit I also do not agree with Daniel's remark on "content
> negotiation based
> >>>> on (sophisticated) HTTP headers sounds counter intuitive". Content
> >>>> negotiations is certainly very intuitive to me...
> >>>>
> >>>> All that being said, and that is where maybe there is actually a minor
> >>>> disagreement between Leonard and I: I do not say that content
> negotiation is
> >>>> the only approach to set up a server storage. The text I wrote is
> >>>> deliberately open ended insofar as it described what the client
> expectation
> >>>> is when that GET request is issued in general terms, and the choice
> among
> >>>> the various alternatives are all the server's. The list of possible
> server
> >>>> behaviours in the text are possible alternatives, instead of hard
> >>>> requirements. The client is responsible in following the various
> possible
> >>>> paths and, maybe, we will have to describe those possibilities later
> in more
> >>>> details (precise usage of the LINK header, the <link> element, media
> types,
> >>>> etc), but that gives the liberty to set up the server the way the
> publisher
> >>>> wants. If we accept this approach, ie, that the client has some
> complexity
> >>>> to resolve in favour of a variety of possible server setups, then I
> do not
> >>>> think there is a major disagreement among us.
> >>>>
> >>>> Talk to you guys later…
> >>>>
> >>>> Ivan
> >>>>
> >>>> B.t.w., a more general and slightly philosophical comment: we should
> not be
> >>>> afraid of really using HTTP:-) The various header information in both
> the
> >>>> request and response headers of an HTTP request/response are very
> rich and
> >>>> sophisticated. There are many situations, on expiration dates, on
> security,
> >>>> etc, and of course content negotiations that can be expressed via
> these HTTP
> >>>> headers, and we should not shy away using those whenever we can and
> it makes
> >>>> sense. As I showed in one of may mails it is not that complex to set
> up
> >>>> (actually, and to be fair, setting up content negotiations is
> probably the
> >>>> more complex thing, I accept that).
> >>>>
> >>>> If you are interested by the various possibilities, this site may be
> of
> >>>> interest:
> >>>>
> >>>> https://github.com/dret/sedola/blob/master/MD/headers.md
> >>>>
> >>>>
> >>>>
> >>>> On 18 Feb 2016, at 09:24, Romain <rdeltour@gmail.com> wrote:
> >>>>
> >>>>
> >>>> On 18 Feb 2016, at 02:49, Leonard Rosenthol <lrosenth@adobe.com>
> wrote:
> >>>>
> >>>> Actually, the big issue that I took away from the minutes is that
> ivan and I
> >>>> are in agreement that content negotiation (via standard web technique
> incl.
> >>>> the Accept header) is the proper way for the client & server to
> decide what
> >>>> to return on the GET from the canonical locator.   Daniel, however,
> appears
> >>>> (from the minutes) to be promoting a completely different approach.
> >>>>
> >>>>
> >>>> As stated before [1], I am absolutely not convinced that content
> negotiation
> >>>> is a good approach.
> >>>> I want to upload a PWP tomorrow to a static file hosting service; if
> conneg
> >>>> is required I can't do that.
> >>>>
> >>>> More to the point: how to you GET the (manifest + Lu + Lp) info with
> the
> >>>> conneg solution? Maybe I just miss something.
> >>>>
> >>>> Finally, may I turn the question the other way around: what are the
> benefits
> >>>> of content negotiation for the canonical locator? (compared to an
> >>>> alternative approach with explicit links in the GET answer (headers or
> >>>> payload).
> >>>>
> >>>> Thanks,
> >>>> Romain.
> >>>>
> >>>> [1]
> https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0136.html
> >>>>
> >>>>
> >>>> Daniel, if you can explain why you want to do something different
> from the
> >>>> standard web/REST model, I’d like to understand.
> >>>>
> >>>> Leonard
> >>>>
> >>>> From: Romain <rdeltour@gmail.com>
> >>>> Date: Wednesday, February 17, 2016 at 6:26 PM
> >>>> To: Daniel Weck <daniel.weck@gmail.com>, Leonard Rosenthol
> >>>> <lrosenth@adobe.com>
> >>>> Cc: "DPUB mailing list (public-digipub-ig@w3.org)"
> >>>> <public-digipub-ig@w3.org>, Tzviya Siegman <tsiegman@wiley.com>
> >>>> Subject: Re: [dpub-loc] 20160217 minutes
> >>>>
> >>>> On 17 Feb 2016, at 23:12, Daniel Weck <daniel.weck@gmail.com> wrote:
> >>>>
> >>>> Hi Leonard, that's quite a bold statement, but I suspect the minutes
> could
> >>>> do with a few corrections.
> >>>>
> >>>> My bad if the minutes are inaccurate, please feel free to amend. It
> was a
> >>>> bit frustrating too: several times I wanted to talk or precise a
> point but
> >>>> was busy typing.
> >>>>
> >>>> At any rate, I look forward to the recap from you and Ivan at the next
> >>>> opportunity. PS: it was a small quorum on this concall, but I was
> under the
> >>>> impression that the participants agreed on the broad lines of your
> proposal,
> >>>> with only details to clarify.
> >>>>
> >>>> My impression is that participants generally agreed with the
> presentation of
> >>>> the issues and some principles. I believe that the main point that is
> still
> >>>> controversial is really what should be the answer to a GET on the
> canonical
> >>>> locator.
> >>>>
> >>>>> I think we need to go do this over again next week – which si
> extremely
> >>>>> unfortunate.
> >>>>
> >>>>
> >>>> If I'm not mistaken Matt, Markus, Tzviya and I won't be able to attend
> >>>> (EDUPUB summit).
> >>>>
> >>>> Romain.
> >>>>
> >>>> Regards, Daniel
> >>>>
> >>>> On 17 Feb 2016 9:17 p.m., "Leonard Rosenthol" <lrosenth@adobe.com>
> wrote:
> >>>>>
> >>>>> Sorry that I was unable to attend today, especially since the
> discussion
> >>>>> (based on the minutes) seems to completely undo all the work that
> Ivan,
> >>>>> myself and others did on the mailing list during the past week.   The
> >>>>> position presented by Daniel is the exact opposite of what Ivan’s
> musings
> >>>>> (adjusted based on mail conversations) presented.
> >>>>>
> >>>>> I think we need to go do this over again next week – which si
> extremely
> >>>>> unfortunate.
> >>>>>
> >>>>> Leonard
> >>>>>
> >>>>> Fro  "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com>
> >>>>> Date: Wednesday, February 17, 2016 at 11:46 AM
> >>>>> To: "DPUB mailing list (public-digipub-ig@w3.org)"
> >>>>> <public-digipub-ig@w3.org>
> >>>>> Subject: [dpub-loc] 20160217 minutes
> >>>>> Resent-From: <public-digipub-ig@w3.org>
> >>>>> Resent-Date: Wednesday, February 17, 2016 at 11:48 AM
> >>>>>
> >>>>> Minutes from today’s meeting:
> >>>>> https://www.w3.org/2016/02/17-dpub-loc-minutes.html
> >>>>>
> >>>>> Tzviya Siegman
> >>>>> Digital Book Standards & Capabilities Lead
> >>>>> Wiley
> >>>>> 201-748-6884
> >>>>> tsiegman@wiley.com
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ----
> >>>> Ivan Herman, W3C
> >>>> Digital Publishing Lead
> >>>> Home: http://www.w3.org/People/Ivan/
> >>>> mobile: +31-641044153
> >>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> >>
> >> ----
> >> Ivan Herman, W3C
> >> Digital Publishing Lead
> >> Home: http://www.w3.org/People/Ivan/
> >> mobile: +31-641044153
> >> ORCID ID: http://orcid.org/0000-0003-0782-2704
> >>
> >>
> >>
> >>
> >
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
>
>
>
>
>

Received on Thursday, 18 February 2016 17:09:42 UTC