Re: jar protocol (was: ZIP archive API?) from Stian Soiland-Reyes on 2013-05-10 (public-webapps@w3.org from April to June 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Fri, 10 May 2013 16:43:18 +0100
To: David Sheets <kosmo.zb@gmail.com>
Cc: Robin Berjon <robin@w3.org>, Jonas Sicking <jonas@sicking.cc>, Webapps WG <public-webapps@w3.org>
Message-ID: <CAPRnXtkkokNpCUPjMeQSWvaAV7=N6uCKE2CiENXj--+4+1MmBw@mail.gmail.com>
This seems very related to how prefixes/terms are expanded to IRIs in
JSON-LD - see http://www.w3.org/TR/json-ld/#iris

The JSON-LD approach is more like registering new "local" protocols,
as they look like URIs.

If we tried that out, then:
 <link rel="bundle" href="/bundle.zip" anchor="b2" />
would mean that
   <a href="b2:fred/hello.txt"> would resolve to fred/hello.txt within
bundle.zip.



The difference with Robin's proposal defines a new relative "prefix" -
almost like UNIX/Linux can let you mount /home/fred to a different
partition than /home - and therefore has this nice HTTP fall-back.
You won't have to worry about someone else defining the "b2" protocol,
as you operate within your own URI namespace.

One downside with not having a URI scheme is that you need to
propagate the <link> bindings in any document that needs it - which is
probably OK, not very different from how RDF Turtle uses @prefix and
XML uses xmlns:fred =.


On 7 May 2013 21:31, David Sheets <kosmo.zb@gmail.com> wrote:
> On Tue, May 7, 2013 at 3:29 PM, Robin Berjon <robin@w3.org> wrote:
>> On 06/05/2013 20:42 , Jonas Sicking wrote:
>>>
>>> The only things that implementations can do that JS can't is:
>>> * Implement new protocols. I definitely agree that we should specify a
>>> jar: or archive: protocol, but that's orthogonal to whether we need an
>>> API.
>>
>>
>> Have you looked at just reusing JAR for this (given that you support it in
>> some form already)? I wonder how well it works. Off the top of my head I see
>> at least two issues:
>>
>> • Its manifest format has lots of useless stuff, and is missing some things
>> we would likely want (like MIME type mapping).
>>
>> • It requires its own URI scheme, which means that there is essentially no
>> transition strategy for content: you can only start using it when everyone
>> is (or you have to do UA detection).
>>
>> I wonder if we couldn't have a mechanism that would not require a separate
>> URI scheme. Just throwing this against the wall, might be daft:
>>
>> We add a new <link> relationship: bundle (archive is taken, bikeshed later).
>> The href points to the archive, and there can be as many as needed. The
>> resolved absolute URL for this is added to a list of bundles (there is no
>> requirement on when this gets fetched, UAs can do so immediately or on first
>> use depending on what they wish to optimise for).
>>
>> After that, whenever there is a fetch for a resource the URL of which is a
>> prefix match for this bundle the content is obtained from the bundle.
>>
>> This isn't very different from JAR but it does have the property of more
>> easily enabling a transition. To give an example, say that the page at
>> http://berjon.com/ contains:
>>
>>     <link rel="bundle" href="bundle.wrap">
>>
>> and
>>
>>     <img src="bundle.wrap/img/dahut.png" alt="a dahut">
>>
>> A UA supporting this would grab the bundle, then extract the image from it.
>> A UA not supporting this would do nothing with the link, but would issue a
>> request for /bundle.wrap/img/dahut.png. It is then fairly easy on the server
>> side to be able to detect that it's a wrapped resource and serve it from
>> inside the bundle (or whatever local convention it wants to adopt that
>> allows it to cater to both — in any case it's trivial).
>>
>> This means no URL scheme to be supported by everyone, no nested URL scheme
>> the way JAR does it (which is quite distasteful), no messing with escaping !
>> in paths, etc.
>>
>> WDYT?
>
> This is really cool!
>
> Most servers already contain support for this in the form of index files.
>
> If you do
>
>     <link rel="bundle" href="bundle.wrap/" />
>
> and set your server's file directory resolver to match index.zip, you
> don't need any special server-side extraction or handling: just
> extract the archive root as sibling to index.zip when you deploy!
>
> Additionally, this piggybacks application resource caching on top of
> HTTP caching.
>
> One quirk of this scheme (ha) is its notion of "root path". With this
> path pattern match, the subresources in the archive exist in the
> domain's single top-level path structure. This means that for archives
> to be fully self-contained they must only use relative references that
> do not escape the archive root. Of course, this is also a feature when
> the containment of the archive is not a concern.
>
> How does directory resolution inside a bundle work? i.e. resolve
> "bundle.wrap/dir/" ? It seems like this (listing) is a key feature of
> the "API" that was being discussed. I support a JSON object without a
> well-known name, personally.
>
> Can we use
>
>     Link: <bundle.wrap/>; REL=bundle
>
> for generic resources?
>
> Does
>
>     <a href="bundle.wrap/page.html">Go!</a>
>
> make a server request or load from the bundle?
>
> Do bundle requests Accept archive media types?
>
> Do generic requests (e.g. address bar) Accept archive media types?
>
> What if I do
>
>     <link rel="bundle" href="" />
>
> ?
>
> Will this page be re-requested Accept-ing archive media types?
>
> Could bundles be entirely prefixed based?
>
> What does
>
>     <link rel="bundle" href="bundle.wrap#" />
>
> with
>
>     <img src="bundle.wrap#images/dahut.png" /> <!-- or is it
> bundle.wrap#/images/dahut.png ? -->
>
> do? Or
>
>     <link rel="bundle" href="bundle.wrap?" />
>
> with
>
>     <img src="bundle.wrap?images/dahut.png" /> <!-- or is it
> bundle.wrap?/images/dahut.png ? -->
>
> ?
>
> Your approach is very compelling, Robin. What do you think about the
> roots and indexes?
>
> Best wishes,
>
> David
>
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/
Received on Friday, 10 May 2013 15:44:10 UTC