Re: Caching from Henry S. Thompson on 2007-04-04 (public-xml-processing-model-wg@w3.org from April 2007)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Wed, 04 Apr 2007 17:31:09 +0100
To: Norman Walsh <Norman.Walsh@Sun.COM>
Cc: public-xml-processing-model-wg@w3.org
Message-ID: <f5b7issyucy.fsf@hildegard.inf.ed.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Regrets, I'm out of the office tomorrow.  Richard can vote the
Univ. Edinburgh position -- if he's away, as I think he might be,
proxy to the chair. MSM can vote the W3C position -- likewise, if _he_
is away, proxy to the chair.

Wrt the issue at hand:

Norman Walsh writes:

> The caching issue has me quite concerned. I see three options:
>
> 1. We say nothing about it. Some implementations will cache, others
> won't. Even on systems that don't cache, side effects of operation
> will sometimes make caching appear to happen and sometimes not.

Not acceptable -- too much negative impact on interop.

> 2. We require caching. This may be a significant implementation issue.
> It looks like a big step for V1.

My preference, but see below -- I think we have to carefully bound the
impact of the requirement.

> 3. We forbid caching. This may be a significant implementation issue.
> It may not even be possible for some implementations to prevent side
> effects from "effective" caching.

Not sure we couldn't find a way to make this work (along the lines of
QT's "fetch once" requirement), but I'd rather put the work into
speccing a limited-scope required caching.  I guess this does involve
forbidding it _outside_ that scope, hmmm.

Anyway, what I have in mind starts from my previous post on the
subject [1]:

  "Give 'p:group' an input, call it 'cache',
   and specify that any documents presented at that port must function
   as a local cache for any http GET issued by any step inside the
   group's subpipeline."

To clarify a bit -- this would require implementations to:

 a) Not start evaluation of the p:group's subpipeline until all the
    inputs into the 'cache' port are complete;

 b) Taking all the documents presented to that port and indexing them
    via the [base URI] of their Document Information Item to create a
    local cache (two minor issues: what about docs with no [base URI]?
    What about duplicates?);

 c) Intercepting at least all http: and file: requests and supplying
    them from the cache if possible (issues: is this any use if it
    only covers explicit inputs, i.e. <p:document href="..."/> -- I
    think not.  So it means getting into the platform retrieval
    stack(s) -- how hard _is_ this?);

 d) Specifying what happens _outside_ of p:group with a 'cache' port,
    e.g. do we go with a "first fetch" story along the lines of QT, or
    what?

Sorry I can't be there to join in the fun,

ht

[1] http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2007Mar/0138.html
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFGE9LNkjnJixAXWBoRAlgOAJ49Cl72w26Ivy1yHNNpoKGfj8QwNgCePuPz
LGPjpiRjJavFW8FVV6yG5aY=
=760H
-----END PGP SIGNATURE-----

Received on Wednesday, 4 April 2007 16:31:24 UTC