XProc Minutes 27 Apr 2006 from Norman Walsh on 2006-05-02 (public-xml-processing-model-wg@w3.org from May 2006)

From: Norman Walsh <Norman.Walsh@Sun.COM>
Date: Tue, 02 May 2006 10:36:03 -0400
To: public-xml-processing-model-wg@w3.org
Message-ID: <87d5ewo5nw.fsf@nwalsh.com>
[ Scribe apologizes for tardiness. ]

See http://www.w3.org/XML/XProc/2006/04/27-minutes.html

W3C[1]

                                   - DRAFT -

                            XML Processing Model WG

27 Apr 2006

   Agenda[2]

   See also: IRC log[3]

Attendees

   Present
           Alex, Alessandro, Norm, Paul, Richard, Mohamed, Henry, Michael,
           Murray [xx:16-]

   Regrets
           Andrew

   Chair
           Norm

   Scribe
           Norm

Contents

     * Topics
         1. Accept this agenda?
         2. Accept minutes from the previous teleconference?
         3. Next meeting: 4 May telcon
         4. Issue 3096: Are components side-effect free?
         5. Issue 3113: Does the pipeline engine act as a resource manager?
     * Summary of Action Items

     ----------------------------------------------------------------------

  Accept this agenda?

   -> http://www.w3.org/XML/XProc/2006/04/27-agenda.html

   Accepted

  Accept minutes from the previous teleconference?

   -> http://www.w3.org/XML/XProc/2006/04/20-minutes.html

   Accepted

  Next meeting: 4 May telcon

   Any regrets?

   <MoZ> yes

   Face-to-face meeting?

   Registration page: http://www.w3.org/2002/09/wbs/38398/XProcFTF2/[6]

  Issue 3096: Are components side-effect free?

   -> http://www.w3.org/Bugs/Public/show_bug.cgi?id=3096

   Norm proposes:

   I propose that we say that all components are non-functional. That is,

   a pipeline implementation must behave as if it evaluated a component

   every time it occurs. "Must behave as if" is spec-ease for

   "implementations that are clever enough to determine with certainty

   that a component is, in fact, functional are free to cache the

   intermediate results because by golly if it is, no one will be able to

   Richard: This doesn't preclude adding a mechanism later to allow authors
   to assert that a step or component is functional

   Norm: Yes.

   Richard: Does this address the converse case? Producing output
   side-effects and behaving the same way for given inputs

   Norm: This is the "functional" aspect, not the side-effect aspect

   Richard: Side-effects are like hidden outputs, functionality is like
   hidden inputs

   Alex: This is a good place to start, register a new issue about functional
   components?

   <scribe> ACTION: Alex to create an issue about the possibility of
   functional components [recorded in
   http://www.w3.org/2006/04/27-xproc-minutes.html#action01[8]]

   Proposal accepted.

  Issue 3113: Does the pipeline engine act as a resource manager?

   -> http://www.w3.org/Bugs/Public/show_bug.cgi?id=3113

   Norm: One aspect of this question is, does the pipeline engine provide the
   sort of URI-stability that XSLT, for example, gives the document function

   Richard: I strongly disagree with this as a requirement; it requires a
   degree of intimacy between the engine and the components that may not
   always be available

   Alex: Is this something that might be "at user option"

   Norm: I'd like to avoid that if at all possible

   <Zakim> ht, you wanted to push back

   Henry: I need some information; in my current state of knowledge I think
   it's a bad idea for pipeline engines
   ... Especially when you are running a pipeline engine as a server, you do
   not want to flush the cache everytime you run a pipe because it's useful
   to keep things around.
   ... In their parsed and ready-to-go state (provided they haven't changed)
   ... I'm happier saying, "no, you should expect your pipeline to behave in
   the way of any other web application does"
   ... Yes, things can change.

   Alex: If we step back and look at the web browser case, consider an image
   embedded 10 times on the same page. The browser reuses the image.
   ... The resolution of URI-to-resource is stable for the duration of a page
   is one reasonable expectation

   <MSM> [I think the fact that browser do or do not re-fetch is an
   optimization they make, not part of the specification of correct browser
   behavior - am I wrong?]

   Richard: consider other things like XML pipelines, like shell scripts,
   where "cat foo" twice might not return the same file.

   <MSM> [If ten <img> elements in the same HTML document refer to
   "my_image.jpg", and that image is served with a lifetime of 0, are correct
   browsers guaranteed to fetch it only once? What spec says that?]

   Some discussion of whether or not browsers actually behave that way

   <Zakim> MSM, you wanted to say that as an empirical statement, it's not a
   very strong argument for making the behavior part of our spec

   MSM: Implementors will do that for performance reasons regardless of
   whether a spec requires it or not

   Richard: Is there a spec for how you display things in a web page?

   Alex: No

   MSM: In that sense, it's not clear to me that the browser analogy bears on
   our decision

   Alex: There's a user expectation of some aspect of stability

   Richard: I don't think the browser analogy is a good one. The engine is
   running a collection of potentially independently implemented components.

   Murray: I'm relying on my memory, but in HTTP there's a mechanism for
   specifying time-to-live. So if there's a nano-second TTL, then maybe it
   would go get the resource again.
   ... Similarly, if I was getting the time of day from a URI then it might
   change
   ... So if you're worried about that, maybe you need a "caching" component.

   Norm: I think consensus is coming towards the answer "no"

   Alex: I don't agree, I think it's important that URIs are stable for the
   duration of an execution
   ... If you need to identify unique resources, you can generate unique URIs
   with query parameters
   ... We haven't decided if the resources flowing through the pipeline have
   URIs or not

   Richard: I notice that the bug is actually talking about something
   produced by the pipeline

   Norm: I think those are the same case

   Richard: You could provide components that fetch and store URIs stably.

   Norm describes the situation where an XSLT needs to get an ancillary
   resource by URI

   Alex: I really want some URIs to be stable throughout the duration of a
   pipeline

   Murray: I'm not convinced that we don't need a resource manager
   ... I'd like to posit the existence of a component that is a proxy server
   or something of that ilk
   ... That component knows if requests should always send things back from
   the cache

   <MSM> [I agree fervently that as users we need resource managers, and that
   implementations of our language will be more usable if they use good
   resource managers. But we also need character sets. We don't specify a
   character set as part of our spec to meet that need, and the same should
   probably hold for resource management. Separate problem, separate spec.]

   Murray: I think it's the case that sometimes you're going to want the
   documents to remain stable and sometimes you're going to want to get
   current results

   <alexmilowski> yes!

   Richard: But I may be using components that don't know how to use a proxy
   server

   Murray: I thought once you setup a proxy, then all requests went through
   that proxy.

   That's implementation and operating-system dependent

   Richard: Proxies do give a degree of generality that seems nice

   MSM: I'm not sure I'm understanding everything going on here. I agree that
   being able to cache and being able to gaurantee up-to-date resources are
   good things
   ... But lots of these things seem to be not terribly closely related to
   pipelines any more than we need a character set.
   ... We just rely on getting character sets from lower layers.
   ... Building it into the pipeline engine strikes me as a breach of
   orthogonality.
   ... At least for the components that we require an implmentation provide,
   we can say what the answers are or say that they're implementation defined

   Murray: I think you're thinking of it in terms of the pipeline language
   and not the overall processing model. If you're processing large volumes
   of XML, you may want a proxy server that has access to pipeline
   descriptions so that all your documents can be passed through.

   <richard> Beware of assuming that everything comes through HTTP. What if
   they're just files?

   Indeed. The proxy has to handle file: URIs as well.

   MSM: It should be orthogonal. If I've got a caching proxy installed, I
   want my pipeline engine to use that one, not one that it felt it needed to
   build in.

   Alex: The document function in XSLT gets the resource through the local
   environment that might use a local cache

   MSM: The only thing the XSLT language says is that if you call the
   document function with the same URI, you'll get the same document

   Alex: You want to be able to compare the objects you get back from the
   document function.
   ... Do we really have the requirement that things behave this way across
   components?

   Richard: I think that Alex has drawn attention to an important point. XSLT
   can do this because it only says the document function behaves this way.
   ... Are we really going to say that if the stylesheet is a file: URI then
   it can't just open it?

   Murray: In a shell script, you'd handle this by copying it and then
   referring to the copy.

   Richard: Yes, and if you were using a program that had the name hard
   coded, then you couldn't make it use the copy

   Norm attempts to summarize the consensus which remains "no"

   HT: The discussion we've had has been drawn somewhat more narrowly than
   the first sentenc of the actual issue.

   <MSM> [I wonder if there is consensus on the proposition that in cases
   like the example given by Norm in raising the issue, it *is* our
   responsibility to say whether the data stream written to uri Foo is or is
   not guaranteed the same as the data stream (later) read from uri Foo]

   HT: We've discussed in the past the use of pipeline engines as resource
   managers.
   ... Consider output="#foo" somewhere and input="#foo" somewhere else in a
   pipeline.
   ... One way to think about that is that the engine is managing those
   resources.
   ... I don't believe that issue is off the table because of this discussion

   Norm: I agree

   <MSM> I'm a little puzzled / troubled here. If I interpret output="#foo"
   and input="#foo" as references to resources to be managed by the pipeline,
   then I suddenly have an ambiguity I didn't use to have:

   <MSM> does the input stream read the ouptut stream?

   <MSM> or is this a pipeline which reads resource #foo, does something with
   it, and writes it back?

   Scribe lost the thread

   <MSM> ht, I wonder if you can expound on how you would propose avoiding
   this ambiguity

   <ht> So I think Richard just expressed the dichotomy in an interesting way
   -- do we name ports, or infosets

   <MSM> +1: Richard's formulation of the question is an acute one

   ADJOURNED

Summary of Action Items

   [NEW] ACTION: Alex to create an issue about the possibility of functional
   components [recorded in
   http://www.w3.org/2006/04/27-xproc-minutes.html#action01[10]]
   **
   [End of minutes]

     ----------------------------------------------------------------------

   [1] http://www.w3.org/
   [2] http://www.w3.org/XML/XProc/2006/04/27-agenda.html
   [3] http://www.w3.org/2006/04/27-xproc-irc
   [6] http://www.w3.org/2002/09/wbs/38398/XProcFTF2/
   [8] http://www.w3.org/2006/04/27-xproc-minutes.html#action01
   [10] http://www.w3.org/2006/04/27-xproc-minutes.html#action01
   [11] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
   [12] http://dev.w3.org/cvsweb/2002/scribe/

    Minutes formatted by David Booth's scribe.perl[11] version 1.127 (CVS
    log[12])
    $Date: 2006/05/02 14:32:48 $
Received on Tuesday, 2 May 2006 14:36:15 UTC