Re: [dpub-loc] Draft update from Ivan Herman on 2016-02-16 (public-digipub-ig@w3.org from February 2016)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 16 Feb 2016 09:38:45 +0100
To: Ben De Meester <ben.demeester@ugent.be>
Cc: Leonard Rosenthol <lrosenth@adobe.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-Id: <17A3C6ED-B8E8-4624-97A6-5B4B497A4A23@w3.org>
> On 16 Feb 2016, at 09:33, Ben De Meester <ben.demeester@ugent.be> wrote:
> 
> Hi Ivan, all,
> 
> So, if I understand correctly, M consists of two parts: the manifest (the list of files that P comprises, comparable to what we have in, e.g., EPUB) Ma, and the link set Mlinks (i.e., the set L, Lu, and Lp).
> Ma is part of all states of P, and Mlinks is (probably) stored somewhere outside of P (the options for generating and/or storing Mlinks are manyfold: as a JSON-file, from a database, from a web service, automatically derived from the .htaccess file, ... I don't think there is a need now to specify that, just as we at the moment don't have to specify how M is returned).
> When someone GETs L, Lu, or Lp, S returns (the dynamically generated) M, in some way or another (see e.g., Ivan's suggestions), so the PWP processor knows both Ma and Mlinks.
> From there, the PWP processor knows what to do.

Yes, I think this is a good summary.

> 
> Concerning the 'server-modifications' discussion: as far as I see, we have two options discussed when trying to GET a resource from a packed PWP (and this, in fact, is orthogonal to the 'how to return M discussion'):
> either the server is modified to know about the internals of the package format, and returns the resource to the client (complex server, simple client)
> or the server returns the entire package, and the client needs to know the internals of the package to retrieve the resource from the packed PWP (simple server, complex client).
> Both have pros and cons, and I have the feeling this is the same problem as asking for any kind of data from a knowledge base from the web: either you download the entire data dump and retrieve the data on the client side, or you set up a query service and the client asks the question directly to the server. The end result is the same, the functionalities are the same, it's just a matter of where to put the complexity. Maybe, other intermediate options are also possible.
> So maybe, this last discussion doesn't have to be answered: complex servers can help the client to retrieve resources more efficiently, complex clients can handle simple servers, and we'll all live in a hybrid world.

At this point, I definitely agree that the PWP spec does not have to (formally) specify all the various alternatives, certainly not trying to be exhaustive. But, I believe, due diligence requires that we do list some viable approaches that proves that whatever we are talking about is not just hot air:-)

I.

> 
> Any thoughts?
> 
> Greetings,
> Ben
> 
> Ben De Meester
> Researcher Semantic Web
> Ghent University - iMinds - Data Science Lab | Faculty of Engineering and Architecture | Department of Electronics and Information Systems
> Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium
> t: +32 9 331 49 59 | e: ben.demeester@ugent.be <mailto:ben.demeester@ugent.be> | URL:  http://users.ugent.be/~bjdmeest/ <http://users.ugent.be/~bjdmeest/>
> 
> 2016-02-15 11:32 GMT+01:00 Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>>:
> Leonard,
> 
>> On 12 Feb 2016, at 18:21, Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>> wrote:
>> 
>> I don’t see any bootstrapping required.   Sure, having server-based modifications (of various flavors) would make for a more optimization implementation but IMO it’s optional and not required.  (this also matches what you were saying on the phone the other day.  Have you changed your mind??)
>> 
> 
> 
> we may mutually misunderstand one another, so maybe it is better (and clearer to the others) if I write down this (only) issue we have with my original writeup to see where we really are.
> 
> My original writeup[1] said:
> 
> > 1. The PWP Processor has access to the information in M.
> > 2. As a consequence, M contains the list of states (and their Locators) that are available on S. In other words, the PWP Processor “knows” Lp and Lu, together with their media types.
> 
> And your comment was:
> 
> > I can’t agree to the first half of assumption #2. It would imply that M is created AFTER P is already placed on the server (or is authored by the same system that is responsible for hosting P on S). And if M is modified after P is created, then P isn’t actually P, but instead is P’ - which might be fine for the purposes of publishing, but we need to be clear about that.
> 
> And you proposed to simply remove of that sentence, leaving only "the PWP Processor 'knows' Lp and Lu..."
> 
> First of all, your comment is correct: there is a problem. But I also believe that we should have clear ways to describe how the PWP Processor knows about Lp and Lu, in case it is not in M, and not leaving that question open (which would be the case if that sentence was removed). Without having a clear idea on this, I do not believe our model is credible.
> 
> My general response is therefore to say: "M", ie, the metadata for a specific P, is conceptual in the sense that it is perfectly all right if the PWP Processor "gathers" the content from different sources. What counts is that, at the end of the day, the PWP Processor gets hold of all the data in "M" which then indeed includes Lu and Lp. Ie, we can keep that statement if this fact is made clear in the text and there is a way to ensure that this can be set up (without prescribing a singular way of setting it up).
> 
> We have, in the document, several scenarios listed to get to the metadata (listed at the end of the writeup). What we have to make clear is that the various approaches are not mutually exclusive but, if the metadata comes from different sources, the PWP Process has to combine them. Ie, it is perfectly o.k. if, for example, the result of the GET on the packed data returns the package with the embedded metadata and also uses the HTTP Link header for additional metadata (or use the the HTTP Alternates header? I am not sure on that one) thereby providing the missing Lu and L, for example. The processor combines these information into a coherent M and it indeed gets the information on the list of states as stated in that sentence.
> 
> What is the problem with this? Isn't this acceptable?
> 
> Maybe the source of our misunderstanding is actually elsewhere: I am not sure what you meant by 'server-based modifications'. What I said on the call is that I would be against imposing a server modification that would require a modification of the code of the server itself. Ie, which would require a new recompilation of Apache, for example, or that it would even require the development and installation of a new "mod" module (to continue using the Apache example) to be developed by the community. However, I believe that a mechanism that may, in some cases, require the modification or, rather, the addition of a new response header to a server response (like in that example) should be acceptable; I would expect all servers providing such facilities, even if its usage requires some admin right on the server. I am not saying that should be the only way of achieving something, but it should not be road blocker either.
> 
> Ivan
> 
> P.S. For those of you for whom HTTP header setting is a mystery: if you run Apache and you have the right to include a .htaccess file in a directory, adding a Link header on the file "test.html" in a directory means adding something like:
> 
> <Files "test.html">
> Header set Link "http://www.ex.org/test2.html <http://www.ex.org/test2.html>; rel=canonical"
> </Files>
> 
> to that .htaccess file and, voilà!
> 
> 
> [1] https://github.com/w3c/dpub-pwp-loc/blob/gh-pages/drafts/ivans-musings.md <https://github.com/w3c/dpub-pwp-loc/blob/gh-pages/drafts/ivans-musings.md>
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
> mobile: +31-641044153 <tel:%2B31-641044153>
> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
> 
> 
> 
> 
> 


----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Tuesday, 16 February 2016 08:39:00 UTC