Re: [Locators] Musing on the "URL of a constituent resource" issue... from Ivan Herman on 2015-12-27 (public-digipub-ig@w3.org from December 2015)

From: Ivan Herman <ivan@w3.org>
Date: Sun, 27 Dec 2015 12:20:20 +0100
To: Daniel Weck <daniel.weck@gmail.com>
Cc: W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-Id: <5D0140ED-E09B-4363-BEC4-4131928CF7DC@w3.org>
Daniel,

you may well be right. Or to be even clearer: I sincerely you *are* right, and we can put this whole question at rest as far as PWP is concerned.

My caution is because, in practice, we know that many web sites use such links a lot, whether through some apache tricks or file system symbolic links. (As an example, if you access http://www.w3.org/TR/pwp/, the web server at W3C would actually 'see' a symbolic link, redirecting the GET request to http://www.w3.org/TR/2015/WD-pwp-20151126/. As that example shows, such links are frequently used for versioning.) Are we sure this will not affect the 'portability' of a PWP? Should we use some rules of good behaviour to avoid the effects? Or do we go down the line of having PWP as a generic term, but we (or somebody) will define a profile for various types of publications that demand those behaviours?

As I said, we may as well say that the answer to the 'URL of a constituent resource' issue is that this *is* a non-issue. But we should still go through some real-life situations to put our minds at ease...

Cheers

Ivan

> On 24 Dec 2015, at 11:11, Daniel Weck <daniel.weck@gmail.com> wrote:
> 
> Hello,
> 
> As discussed before, HTTP GET requests to the following (illustrative)
> URLs could effectively result in the exact same response (i.e. payload
> + content type), regardless of whether a static filesystem is used in
> the server backend, or whether database queries are performed behind
> the scenes:
> 
> https://server.com/get/publication1.pwp
> 
> https://cdn.server.com/download?pwp=publication1
> 
> https://pwp.server.com/publication1
> 
> https://publication1.server.com
> 
> etc.
> 
> By the way, the above example "publication1.pwp" may be interpreted as
> a single-file packaged / zipped PWP (e.g. 'application/pwp+zip'
> content type), or as a reference to a container file (e.g.
> 'application/pwp-manifest+json') ... but this is a separate issue.
> 
> So, I don't believe that a new method for "symbolic linking" is
> needed, let alone defined within the scope of PWP. Such PWP-specific
> level of indirection would not be aligned with www good practices for
> "regular" websites / web apps. I think that document authors / content
> developers should follow guidelines to future-proof URIs, should use
> existing mechanisms like HTTP redirects, URL rewrites, etc. (thus my
> previous email "Cool URIs don't change" link) In my opinion, a PWP
> specification should define "meta" functional and logical layers, such
> as a publication manifest (well-defined set of container resource URLs
> / URIs), high-level navigation document, bibliographic metadata, etc.
> I can see how "symbolic linking" may help in some cases with
> publication maintenance (i.e. renamed URLs), but this would
> conceptually be not much different than EPUB's ID / IDREF in the OPF
> package.
> 
> Daniel
> 
> 
> On Wed, Dec 23, 2015 at 5:53 PM, Daniel Weck <daniel.weck@gmail.com> wrote:
>> http://www.w3.org/Provider/Style/URI.html
>> 
>> :)
>> 
>> 
>> On Tue, Dec 22, 2015 at 9:26 AM, Ivan Herman <ivan@w3.org> wrote:
>>> Hi all,
>>> 
>>> Just some musing on the whole 'what is the URL of a constituent resource'
>>> issue related to locators; I wanted to step away from the '#' vs. '!' issue.
>>> Sorry for a somewhat longer mail, bear with me; I have some specific
>>> questions at the end which may or may not direct us in a different
>>> direction...
>>> 
>>> To begin, what is the use case which raises this whole discussion in the
>>> context of PWP?
>>> 
>>> Let us say we have two PWP-s, 'A' and 'B'. They have each a URL, ie, a
>>> locator, say:
>>> 
>>> • http://www.ex.org/A
>>> • http://www.ex.org/B
>>> 
>>> The use case we referred to on the call is that there may be a shared
>>> resource 'F' that the publisher wants to maintain (say, a font file on its
>>> own location on the Web, say at:
>>> 
>>> • http://fonts.ex.org/F
>>> 
>>> Per the definition of a PWP, there is no reason why the resource 'F' would
>>> not be part of both 'A' and 'B'. After all, 'A' and 'B' are Web
>>> Publications, ie, "an aggregated set of interrelated Web Resources". Members
>>> of that set are "listed" somewhere to define 'A' and 'B', respectively; this
>>> is also why we would have a manifest (let us put the manifest format aside
>>> for now). If 'F' is self contained, then 'A' and 'B' are also portable web
>>> publications. So? Are we done, there is no issue, can we go home? :-)
>>> 
>>> Well... I think that there are legitimate situations when we want to use
>>> that font from within 'A' or 'B', with simple, fixed, and probably relative
>>> URL-s. E.g., the publisher wants an easy way to relocate 'F' for whatever
>>> reasons to, say, http://fonts.ex.org/fonts/F; that may require all resources
>>> in, say, 'A' to change their URL-s to F. Not good. If there is one place 'X'
>>> that 'points' to the URL of 'F', and all resources 'A' refer to the URL
>>> representing 'X' then maintenance becomes easy. This is a case where our
>>> discussion comes into the picture (using, say, '!' or cfi or similar).
>>> 
>>> Stepping away from the Web for a moment: the scenario above is actually
>>> fairly common in managing one's own file system. And this is why these crazy
>>> computer scientists invented symbolic links. This means that, in my folder
>>> (or directory, depending on the term used), I can create a symbolic link 's'
>>> that refers to the file 'f' somewhere on my file system. Any program
>>> referring to 's' (ie, reading the content 's') will in fact read,
>>> transparently, the content of 'f'; ie, the system will silently open 'f' for
>>> that content. If 'f' is moved then 's' has to change, but no other program
>>> has to be changed. Symbolic links are used all over the place; they have
>>> been around on UNIX-like systems (that includes OS X) for ages and, afaik,
>>> they are also available in Windows.
>>> 
>>> Does this sound familiar? The melody is similar to our issue... Actually, on
>>> specific Web servers it is perfectly possible to reproduce something
>>> somewhat similar. Apache knows the 're-write rule' concept; on my server I
>>> can (if I have the right authorization) set up a special file (usually
>>> .htaccess) where I can add a rewrite rule which essentially says:
>>> 
>>> • "if you see 'http://www.ex.org/A/fonts/F' then go to
>>> 'http://fonts.ex.org/F' instead"
>>> 
>>> (What this really does is to send back an HTTP response to the client
>>> instructing it to issue a new request to the other URL.)
>>> 
>>> The good think is that this is common occurence on the Web, and does not
>>> require any special processing on the client. The bad thing is that this is
>>> Apache specific, not all users have the right to set something like that up,
>>> not all users knows what exactly to do. But, also, it is a bit vulnerable
>>> because all re-write rules would be in one place (at least for a directory):
>>> the distributed approach of symbolic links is much safer against accidental
>>> damage, which is a good thing.
>>> 
>>> So… here is my question. Is it possible to have a symbolic link like
>>> structure to solve our problems? I.e., tiny small files (possibly with some
>>> attached minor javascripts) whose only role is to instruct the client to do
>>> a redirection somewhere else? Something that makes use of existing
>>> technologies; ideally, the client (browser, javascript) should not even have
>>> to do anything because all the redirections are handled on HTTP level?  I
>>> have seen PHP based solutions, but that is not a good approach, we do not
>>> want to be dependent on a specific server side technology. Anybody knows a
>>> good approach that we may get inspired by?
>>> 
>>> (Actually, if the server runs Linux (or MacOS) real symbolic links also make
>>> the trick, because servers would follow those just as any other programs
>>> do.)
>>> 
>>> Is this line of thought worse pursuing, or is complete boloney?
>>> 
>>> Merry Xmas or any other relevant holidays to you all!
>>> 
>>> Ivan
>>> 
>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>> 
>>> 
>>> 
>>> 
> 


----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Sunday, 27 December 2015 11:20:30 UTC