- From: Ben De Meester <ben.demeester@ugent.be>
- Date: Tue, 16 Feb 2016 10:09:03 +0100
- To: Ivan Herman <ivan@w3.org>
- Cc: Leonard Rosenthol <lrosenth@adobe.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
- Message-ID: <CAJ-O9TsL5hxQceVJt2Edt0D5TAK7bbCQFaPRiMJP5YSLz_bYPg@mail.gmail.com>
2016-02-16 9:38 GMT+01:00 Ivan Herman <ivan@w3.org>: > > On 16 Feb 2016, at 09:33, Ben De Meester <ben.demeester@ugent.be> wrote: > > Hi Ivan, all, > > So, if I understand correctly, *M* consists of two parts: the manifest > (the list of files that *P* comprises, comparable to what we have in, > e.g., EPUB) *Ma*, and the link set *Mlinks* (i.e., the set *L*, *Lu*, and > *Lp*). > *Ma* is part of all states of *P*, and *Mlinks* is (probably) stored > somewhere outside of *P* (the options for generating and/or storing > *Mlinks* are manyfold: as a JSON-file, from a database, from a web > service, automatically derived from the .htaccess file, ... I don't think > there is a need now to specify that, just as we at the moment don't have to > specify *how* *M* is returned). > When someone GETs *L*, *Lu*, or *Lp*, *S* returns (the dynamically > generated) *M*, in some way or another (see e.g., Ivan's suggestions), so > the PWP processor knows both *Ma* and *Mlinks*. > From there, the PWP processor knows what to do. > > > Yes, I think this is a good summary. > > > Concerning the 'server-modifications' discussion: as far as I see, we have > two options discussed when trying to GET a resource from a packed PWP (and > this, in fact, is orthogonal to the 'how to return *M* discussion'): > > - either the server is modified to know about the internals of the > package format, and returns the resource to the client (complex server, > simple client) > - or the server returns the entire package, and the client needs to > know the internals of the package to retrieve the resource from the packed > PWP (simple server, complex client). > > Both have pros and cons, and I have the feeling this is the same problem > as asking for any kind of data from a knowledge base from the web: either > you download the entire data dump and retrieve the data on the client side, > or you set up a query service and the client asks the question directly to > the server. The end result is the same, the functionalities are the same, > it's just a matter of where to put the complexity. Maybe, other > intermediate options are also possible. > So maybe, this last discussion doesn't have to be answered: complex > servers can help the client to retrieve resources more efficiently, complex > clients can handle simple servers, and we'll all live in a hybrid world. > > > At this point, I definitely agree that the PWP spec does not have to > (formally) specify all the various alternatives, certainly not trying to be > exhaustive. But, I believe, due diligence requires that we do list *some* > viable approaches that proves that whatever we are talking about is not > just hot air:-) > Also fully agree, and -- as I assume the main issue here is, e.g., where to unzip the packed PWP, client-side or server-side -- I think there are viable approaches a-plenty, a quick search returned: http://stuk.github.io/jszip/ (client-side) and http://search.cpan.org/~phred/Archive-Zip-1.56/lib/Archive/Zip/MemberRead.pm (server-side, although apache modules is not my cup of tea, so I might be wrong here) > > I. > > > Any thoughts? > > Greetings, > Ben > > Ben De Meester > Researcher Semantic Web > Ghent University - iMinds - Data Science Lab | Faculty of Engineering and > Architecture | Department of Electronics and Information Systems > Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium > t: +32 9 331 49 59 | e: ben.demeester@ugent.be | URL: > http://users.ugent.be/~bjdmeest/ > > 2016-02-15 11:32 GMT+01:00 Ivan Herman <ivan@w3.org>: > >> Leonard, >> >> On 12 Feb 2016, at 18:21, Leonard Rosenthol <lrosenth@adobe.com> wrote: >> >> I don’t see any bootstrapping required. Sure, having server-based >> modifications (of various flavors) would make for a more optimization >> implementation but IMO it’s optional and not required. (this also matches >> what you were saying on the phone the other day. Have you changed your >> mind??) >> >> >> >> we may mutually misunderstand one another, so maybe it is better (and >> clearer to the others) if I write down this (only) issue we have with my >> original writeup to see where we really are. >> >> My original writeup[1] said: >> >> > 1. The PWP Processor has access to the information in M. >> > 2. As a consequence, M contains the list of states (and their Locators) >> that are available on S. In other words, the PWP Processor “knows” Lp and >> Lu, together with their media types. >> >> And your comment was: >> >> > I can’t agree to the first half of assumption #2. It would imply that M >> is created AFTER P is already placed on the server (or is authored by the >> same system that is responsible for hosting P on S). And if M is modified >> after P is created, then P isn’t actually P, but instead is P’ - which >> might be fine for the purposes of publishing, but we need to be clear about >> that. >> >> And you proposed to simply remove of that sentence, leaving only "the PWP >> Processor 'knows' *Lp* and *Lu*..." >> >> First of all, your comment is correct: there is a problem. But I also >> believe that we should have clear ways to describe *how* the PWP >> Processor knows about *Lp* and *Lu*, in case it is not in *M*, and not >> leaving that question open (which would be the case if that sentence was >> removed). Without having a clear idea on this, I do not believe our model >> is credible. >> >> My general response is therefore to say: "*M*", ie, the metadata for a >> specific *P*, is *conceptual* in the sense that it is perfectly all >> right if the PWP Processor "gathers" the content from different sources. >> What counts is that, at the end of the day, the PWP Processor gets hold of >> *all* the data in "*M*" which then indeed includes *Lu* and *Lp*. Ie, we >> can keep that statement if this fact is made clear in the text *and* >> there is a way to ensure that this can be set up (without prescribing a >> singular way of setting it up). >> >> We have, in the document, several scenarios listed to get to the metadata >> (listed at the end of the writeup). What we have to make clear is that the >> various approaches are *not* mutually exclusive but, if the metadata >> comes from different sources, the PWP Process has to combine them. Ie, it >> is perfectly o.k. if, for example, the result of the GET on the packed data >> returns the package with the embedded metadata *and* also uses the HTTP >> Link header for additional metadata (or use the the HTTP Alternates header? >> I am not sure on that one) thereby providing the missing *Lu* and *L*, >> for example. The processor combines these information into a coherent *M* >> and it indeed gets the information on the list of states as stated in that >> sentence. >> >> What is the problem with this? Isn't this acceptable? >> >> Maybe the source of our misunderstanding is actually elsewhere: I am not >> sure what you meant by 'server-based modifications'. What I said on the >> call is that I would be against imposing a server modification that would >> require a modification of the code of the server itself. Ie, which would >> require a new recompilation of Apache, for example, or that it would even >> require the development and installation of a new "mod" module (to continue >> using the Apache example) to be developed by the community. However, I >> believe that a mechanism that may, in some cases, require the modification >> or, rather, the addition of a new response header to a server response >> (like in that example) should be acceptable; I would expect all servers >> providing such facilities, even if its usage requires some admin right on >> the server. I am not saying that should be the *only* way of achieving >> something, but it should not be road blocker either. >> >> Ivan >> >> P.S. For those of you for whom HTTP header setting is a mystery: if you >> run Apache and you have the right to include a .htaccess file in a >> directory, adding a Link header on the file "test.html" in a directory >> means adding something like: >> >> <Files "test.html"> >> Header set Link "http://www.ex.org/test2.html; rel=canonical" >> </Files> >> >> to that .htaccess file and, voilà! >> >> >> [1] >> https://github.com/w3c/dpub-pwp-loc/blob/gh-pages/drafts/ivans-musings.md >> >> >> ---- >> Ivan Herman, W3C >> Digital Publishing Lead >> Home: http://www.w3.org/People/Ivan/ >> mobile: +31-641044153 >> ORCID ID: http://orcid.org/0000-0003-0782-2704 >> >> >> >> >> > > > ---- > Ivan Herman, W3C > Digital Publishing Lead > Home: http://www.w3.org/People/Ivan/ > mobile: +31-641044153 > ORCID ID: http://orcid.org/0000-0003-0782-2704 > > > > >
Received on Tuesday, 16 February 2016 09:10:02 UTC