Re: [dpub-loc] Draft update from Leonard Rosenthol on 2016-02-16 (public-digipub-ig@w3.org from February 2016)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Tue, 16 Feb 2016 12:06:17 +0000
To: Ivan Herman <ivan@w3.org>
CC: Ben De Meester <ben.demeester@ugent.be>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <C844873B-82A4-4ADC-BA22-B79C64A13EDA@adobe.com>
[Sorry, US holidays]

Yes, I agree that conceptually the PWP processor is responsible for figuring out all the parts of M (which it may gather from various methods).

Leonard

From: Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>>
Date: Monday, February 15, 2016 at 5:32 AM
To: Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>>
Cc: Ben De Meester <ben.demeester@ugent.be<mailto:ben.demeester@ugent.be>>, W3C Digital Publishing IG <public-digipub-ig@w3.org<mailto:public-digipub-ig@w3.org>>
Subject: Re: [dpub-loc] Draft update

Leonard,

On 12 Feb 2016, at 18:21, Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:

I don’t see any bootstrapping required.   Sure, having server-based modifications (of various flavors) would make for a more optimization implementation but IMO it’s optional and not required.  (this also matches what you were saying on the phone the other day.  Have you changed your mind??)



we may mutually misunderstand one another, so maybe it is better (and clearer to the others) if I write down this (only) issue we have with my original writeup to see where we really are.

My original writeup[1] said:

> 1. The PWP Processor has access to the information in M.
> 2. As a consequence, M contains the list of states (and their Locators) that are available on S. In other words, the PWP Processor “knows” Lp and Lu, together with their media types.

And your comment was:

> I can’t agree to the first half of assumption #2. It would imply that M is created AFTER P is already placed on the server (or is authored by the same system that is responsible for hosting P on S). And if M is modified after P is created, then P isn’t actually P, but instead is P’ - which might be fine for the purposes of publishing, but we need to be clear about that.

And you proposed to simply remove of that sentence, leaving only "the PWP Processor 'knows' Lp and Lu..."

First of all, your comment is correct: there is a problem. But I also believe that we should have clear ways to describe how the PWP Processor knows about Lp and Lu, in case it is not in M, and not leaving that question open (which would be the case if that sentence was removed). Without having a clear idea on this, I do not believe our model is credible.

My general response is therefore to say: "M", ie, the metadata for a specific P, is conceptual in the sense that it is perfectly all right if the PWP Processor "gathers" the content from different sources. What counts is that, at the end of the day, the PWP Processor gets hold of all the data in "M" which then indeed includes Lu and Lp. Ie, we can keep that statement if this fact is made clear in the text and there is a way to ensure that this can be set up (without prescribing a singular way of setting it up).

We have, in the document, several scenarios listed to get to the metadata (listed at the end of the writeup). What we have to make clear is that the various approaches are not mutually exclusive but, if the metadata comes from different sources, the PWP Process has to combine them. Ie, it is perfectly o.k. if, for example, the result of the GET on the packed data returns the package with the embedded metadata and also uses the HTTP Link header for additional metadata (or use the the HTTP Alternates header? I am not sure on that one) thereby providing the missing Lu and L, for example. The processor combines these information into a coherent M and it indeed gets the information on the list of states as stated in that sentence.

What is the problem with this? Isn't this acceptable?

Maybe the source of our misunderstanding is actually elsewhere: I am not sure what you meant by 'server-based modifications'. What I said on the call is that I would be against imposing a server modification that would require a modification of the code of the server itself. Ie, which would require a new recompilation of Apache, for example, or that it would even require the development and installation of a new "mod" module (to continue using the Apache example) to be developed by the community. However, I believe that a mechanism that may, in some cases, require the modification or, rather, the addition of a new response header to a server response (like in that example) should be acceptable; I would expect all servers providing such facilities, even if its usage requires some admin right on the server. I am not saying that should be the only way of achieving something, but it should not be road blocker either.

Ivan

P.S. For those of you for whom HTTP header setting is a mystery: if you run Apache and you have the right to include a .htaccess file in a directory, adding a Link header on the file "test.html" in a directory means adding something like:

<Files "test.html">
Header set Link "http://www.ex.org/test2.html; rel=canonical"
</Files>

to that .htaccess file and, voilà!


[1] https://github.com/w3c/dpub-pwp-loc/blob/gh-pages/drafts/ivans-musings.md



----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Tuesday, 16 February 2016 12:06:49 UTC