Re: [dpub-loc] Draft update from Ben De Meester on 2016-02-16 (public-digipub-ig@w3.org from February 2016)

From: Ben De Meester <ben.demeester@ugent.be>
Date: Tue, 16 Feb 2016 09:33:57 +0100
To: Ivan Herman <ivan@w3.org>
Cc: Leonard Rosenthol <lrosenth@adobe.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <CAJ-O9Tvuii1jUYwko2yVirwucFDc1+ijg6e4tiAwEyT7sUg8pw@mail.gmail.com>
Hi Ivan, all,

So, if I understand correctly, *M* consists of two parts: the manifest (the
list of files that *P* comprises, comparable to what we have in, e.g.,
EPUB) *Ma*, and the link set *Mlinks* (i.e., the set *L*, *Lu*, and *Lp*).
*Ma* is part of all states of *P*, and *Mlinks* is (probably) stored
somewhere outside of *P* (the options for generating and/or storing
*Mlinks* are
manyfold: as a JSON-file, from a database, from a web service,
automatically derived from the .htaccess file, ... I don't think there is a
need now to specify that, just as we at the moment don't have to specify
*how* *M* is returned).
When someone GETs *L*, *Lu*, or *Lp*, *S* returns (the dynamically
generated) *M*, in some way or another (see e.g., Ivan's suggestions), so
the PWP processor knows both *Ma* and *Mlinks*.
>From there, the PWP processor knows what to do.

Concerning the 'server-modifications' discussion: as far as I see, we have
two options discussed when trying to GET a resource from a packed PWP (and
this, in fact, is orthogonal to the 'how to return *M* discussion'):

   - either the server is modified to know about the internals of the
   package format, and returns the resource to the client (complex server,
   simple client)
   - or the server returns the entire package, and the client needs to know
   the internals of the package to retrieve the resource from the packed PWP
   (simple server, complex client).

Both have pros and cons, and I have the feeling this is the same problem as
asking for any kind of data from a knowledge base from the web: either you
download the entire data dump and retrieve the data on the client side, or
you set up a query service and the client asks the question directly to the
server. The end result is the same, the functionalities are the same, it's
just a matter of where to put the complexity. Maybe, other intermediate
options are also possible.
So maybe, this last discussion doesn't have to be answered: complex servers
can help the client to retrieve resources more efficiently, complex clients
can handle simple servers, and we'll all live in a hybrid world.

Any thoughts?

Greetings,
Ben

Ben De Meester
Researcher Semantic Web
Ghent University - iMinds - Data Science Lab | Faculty of Engineering and
Architecture | Department of Electronics and Information Systems
Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium
t: +32 9 331 49 59 | e: ben.demeester@ugent.be | URL:
http://users.ugent.be/~bjdmeest/

2016-02-15 11:32 GMT+01:00 Ivan Herman <ivan@w3.org>:

> Leonard,
>
> On 12 Feb 2016, at 18:21, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>
> I don’t see any bootstrapping required.   Sure, having server-based
> modifications (of various flavors) would make for a more optimization
> implementation but IMO it’s optional and not required.  (this also matches
> what you were saying on the phone the other day.  Have you changed your
> mind??)
>
>
>
> we may mutually misunderstand one another, so maybe it is better (and
> clearer to the others) if I write down this (only) issue we have with my
> original writeup to see where we really are.
>
> My original writeup[1] said:
>
> > 1. The PWP Processor has access to the information in M.
> > 2. As a consequence, M contains the list of states (and their Locators)
> that are available on S. In other words, the PWP Processor “knows” Lp and
> Lu, together with their media types.
>
> And your comment was:
>
> > I can’t agree to the first half of assumption #2. It would imply that M
> is created AFTER P is already placed on the server (or is authored by the
> same system that is responsible for hosting P on S). And if M is modified
> after P is created, then P isn’t actually P, but instead is P’ - which
> might be fine for the purposes of publishing, but we need to be clear about
> that.
>
> And you proposed to simply remove of that sentence, leaving only "the PWP
> Processor 'knows' *Lp* and *Lu*..."
>
> First of all, your comment is correct: there is a problem. But I also
> believe that we should have clear ways to describe *how* the PWP
> Processor knows about *Lp* and *Lu*, in case it is not in *M*, and not
> leaving that question open (which would be the case if that sentence was
> removed). Without having a clear idea on this, I do not believe our model
> is credible.
>
> My general response is therefore to say: "*M*", ie, the metadata for a
> specific *P*, is *conceptual* in the sense that it is perfectly all right
> if the PWP Processor "gathers" the content from different sources. What
> counts is that, at the end of the day, the PWP Processor gets hold of
> *all* the data in "*M*" which then indeed includes *Lu* and *Lp*. Ie, we
> can keep that statement if this fact is made clear in the text *and*
> there is a way to ensure that this can be set up (without prescribing a
> singular way of setting it up).
>
> We have, in the document, several scenarios listed to get to the metadata
> (listed at the end of the writeup). What we have to make clear is that the
> various approaches are *not* mutually exclusive but, if the metadata
> comes from different sources, the PWP Process has to combine them. Ie, it
> is perfectly o.k. if, for example, the result of the GET on the packed data
> returns the package with the embedded metadata *and* also uses the HTTP
> Link header for additional metadata (or use the the HTTP Alternates header?
> I am not sure on that one) thereby providing the missing *Lu* and *L*,
> for example. The processor combines these information into a coherent *M*
> and it indeed gets the information on the list of states as stated in that
> sentence.
>
> What is the problem with this? Isn't this acceptable?
>
> Maybe the source of our misunderstanding is actually elsewhere: I am not
> sure what you meant by 'server-based modifications'. What I said on the
> call is that I would be against imposing a server modification that would
> require a modification of the code of the server itself. Ie, which would
> require a new recompilation of Apache, for example, or that it would even
> require the development and installation of a new "mod" module (to continue
> using the Apache example) to be developed by the community. However, I
> believe that a mechanism that may, in some cases, require the modification
> or, rather, the addition of a new response header to a server response
> (like in that example) should be acceptable; I would expect all servers
> providing such facilities, even if its usage requires some admin right on
> the server. I am not saying that should be the *only* way of achieving
> something, but it should not be road blocker either.
>
> Ivan
>
> P.S. For those of you for whom HTTP header setting is a mystery: if you
> run Apache and you have the right to include a .htaccess file in a
> directory, adding a Link header on the file "test.html" in a directory
> means adding something like:
>
> <Files "test.html">
> Header set Link "http://www.ex.org/test2.html; rel=canonical"
> </Files>
>
> to that .htaccess file and, voilà!
>
>
> [1]
> https://github.com/w3c/dpub-pwp-loc/blob/gh-pages/drafts/ivans-musings.md
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
>
>
>
>
>
Received on Tuesday, 16 February 2016 08:34:52 UTC