Minutes for XProc WG telcon of 23 Feb 2006 from Norman Walsh on 2006-02-23 (public-xml-processing-model-wg@w3.org from February 2006)

From: Norman Walsh <Norman.Walsh@Sun.COM>
Date: Thu, 23 Feb 2006 12:31:23 -0500
To: public-xml-processing-model-wg@w3.org
Message-ID: <87zmkix9qs.fsf@nwalsh.com>
See: http://www.w3.org/XML/XProc/2006/02/23-minutes.html

W3C[1]

                                   - DRAFT -

                            XML Processing Model WG

23 Feb 2006

   Agenda[2]

   See also: IRC log[3]

Attendees

   Present
           Alessandro, Alex, Andrew, Erik, Henry, Norm, Richard, Rui, Murray
           [0:26-]

   Regrets
           Paul, Robin, Michael

   Chair
           Norm

   Scribe
           Norm

Contents

     * Topics
         1. Administrivia
         2. Accept this agenda?
         3. Accept minutes from the previous teleconference?
         4. Next meeting: 27/28 Feb at the technical plenary.
         5. Agenda planning for the face-to-face
         6. Technical

     ----------------------------------------------------------------------

  Administrivia

   Alex points to the latest requirements document:
   http://www.w3.org/XML/XProc/docs/langreq.html[4]

  Accept this agenda?

   -> http://www.w3.org/XML/XProc/2006/02/23-agenda.html[5]

   Accepted

  Accept minutes from the previous teleconference?

   -> http://www.w3.org/XML/XProc/2006/02/16-minutes.html[6]

   Accepted.

  Next meeting: 27/28 Feb at the technical plenary.

   Room 157 at the Royal Casino Hotel

   Henry reminds us that there's a wiki for ride sharing

   <ht> http://esw.w3.org/topic/MeetingTaxis[7]

  Agenda planning for the face-to-face

   -> http://www.w3.org/XML/XProc/2006/02/27-28-agenda.html[8]

   Who's willing to speak about exisiting tools?

   Norm, Henry, Richard, Alex, Erik, and Rui volunteer.

   Andrew will send a summary of the Arbortext pipeline

   Norm reviews the rest of the planned agenda

   <ebruchez> all good

   Andrew would like the presentations to be in the afternoon because he'll
   be calling in

   Norm: I'll move it to the afternoon

   Monday morning: administriva and use cases

   Monday afternoon: presentations and infoset input/output discussoin

   Norm will update the agenda

   Norm to add note about asking in IRC for phone connectivity

  Technical

   Requirements and use cases.

   Alex: Changes from last week: make validation a design principle a design
   principle; removed naming of pipelines as a requirement
   ... No more editorial changes.
   ... We were working our way through issues on the list.

   Norm asks for explanation of the table at the beginning of section 4

   Alex: It's supposed to map requirements to use cases. The presentation
   doesn't work but it's saying that for each requirement, here's the use
   case that supports that requirement.
   ... I wanted to get rid of having the links in the requirements list so
   that they would be easier to read.
   ... The presentation needs to be fixed.
   ... I'm not really concerned about the presentation right now, just as
   long as we get the content in place.

   Norm: +1
   ... One issue that I thought we could talk about is the issue of string
   paramer or simple datatype parameters as opposed to infoset parameters.

   Norm observes that the last word on this thread was from Erik.

   Erik: We didn't have many use cases that required parameters so we didn't
   mind using a little trick for XSLT

   Henry: I think we have a terminology problem, the example is full of
   parameters!

   Erik: I think the distinction is between infosets and datatype parameters.
   ... The question is do we need to kinds of parameters in the language to
   be able to do this?
   ... If we decide we can only pass XML infosets between components, then
   how do you pass a numeric parameter to a stylesheet?

   Richard: Ok, so this is a small subset of the parameter problem. You're
   talking about parameters that come from other components, rather than
   parameters that are specified when you write the pipeline?

   Erik: Why should parameters only be static?

   Richard: I can see that that's a good generalization, but in my pipeline
   virtually every step has some parameters, but none of them are derived
   from previous steps.

   Erik: Maybe it depends what you call parameter.

   Richard: The sort I'm talking about are XPaths to identify bits of a
   document

   Alex: My pipelines run in a J2EE environment, so I'm passing all sorts of
   stuff to the pipeline.

   <Zakim> ht, you wanted to distinguish at least three cases

   Henry: It's clear that we're talking a little bit at cross-purposes
   ... I can see at least three cases: there are things that I think of as
   parameters that are static, pipeline-design time XML resources:
   stylesheets for XSLT or schema documents for validation.
   ... The second class are design-time controls for components: XPaths, etc.
   I agree with Richard that the 99% case is that that's known at
   pipeline-design time. It's a static parameter.
   ... Another case is command line switches to command line invocations:
   switches with values and booleans, switches that are either present or
   absent. An XInclude impl could have a command-line switch that indicated
   whether or not base fixup is applied.
   ... That's another example of a design-time choice.
   ... The third case is runtime parameterization that gets accessed by
   various components at run time.

   Some discussion of what Alex meant.

   Henry: I think the point about the third class is that they often go
   hand-in-hand with the second case.

   <richard> If your pipeline is compiled, then Alex's examples are
   parameters whose values are not known at compile time

   Henry: Often you have a slot in the pipeline at design time and a run-time
   parameter that fills that slot.

   Norm: design-time parameters in the pipeline, design-time controls for
   components, run-time parameters passed to the pipeline.
   ... those are the three cases?

   <ht> HST distinguished between static resources (stylesheets, schema
   documents)

   Henry: I think it's useful to distinguish between those and others.

   Norm: I think a third case is parameters that come out of one component
   and flow into another.
   ... In the full generality, those could be any kind of parameter, but I've
   been thinking of those only in terms of infosets.

   Richard: One way to deal with it would be to have an XML document that
   contains the parameters and then that could be generated by a stage in the
   pipeline.

   <Zakim> ebruchez, you wanted to discuss using XML infosets to do that or
   the XDM

   Erik: I just wanted to point out that there's some conceptual
   simplifications that could be made.
   ... For example, when I hear of a stylesheet or schema as a parameter, I
   know that in many cases that's static, but you can also simplify it by
   saying that they're both XML documents and you can combine them.
   ... You can just consider an XSLT stylesheet or a schema is just an
   infoset.
   ... In XPL we've been trying to maximize this simplification.

   <ht> HST likes the idea that follows from merging Richard's suggestion
   with Norm's resource pool idea: Provide an name:value store as part of the
   pipeline engine, which can be set a) at pipeline invoication; b) via a
   pseudo-output URI and a standardised XML document; via the engine API as
   it faces the components

   Erik: This way you don't need to switch between concepts. We should try to
   keep that simplification in mind.
   ... If we use the XDM data model then the whole question becomes simpler
   because we can just pass around XDM simple types.
   ... In XPL since we only have infosets, when we need to pass the user
   principle, we encapsulate it all in an XML infoset. So most components
   take an XML infoset as a configuration.

   I think this is what Henry was suggesting a few moments ago

   Henry: no, actually not.

   Richard: I agree that Erik's simplification is good, I just don't want it
   to make the simple case where you have a static stylesheet or schema more
   complex or inefficient.
   ... So if we can keep the generality without precluding optimization, I'm
   all for it.

   Erik: There are ways to avoid the optimization problems.

   Alex: I feel really strongly that simple things should be simple.
   ... If I have a simple string and I need to assign it to a name so that
   some component can access it, turning it into an XML resource seems really
   hard.

   <ht> HST strongly endorses this, even if all it means is that the XML
   _syntax_ for pipeline authoring makes it transparent

   Alex: We need a simple way to bind simple values to names

   Erik: Alex, I think that's fine when you just write them statically. It's
   where you want to generate them that it becomes problematic.
   ... I think you're going to have to have a way to allow a component to
   generate a parameter for some other component.

   Alex: I think if we could come to agreement that there are parameters and
   resources, that would be good. Being able to formally declare a dependency
   on a resource is a good thing.
   ... That means that you have the use case of generating something in the
   middle of a pipeline that is a parameter. But that's a sepearate problem
   and we can decide if that's possible separately.

   <Zakim> ht, you wanted to remind ourselves about using the infoset for
   this . . .

   Henry: Just to add to the dimensionalities we're thinking about, I think
   there's an important distinction between parameterizations of components
   on the one hand and out-of-band computed information on the other.
   ... For out-of-band computed information, the infoset is your friend.
   example, what do you do with an XSLT step in the middle of a pipeline
   which sets the output encoding?
   ... You put an annotation in the infoset so that the information is
   available when you need to serialize it.

   <ebruchez> You may also want to completely separate serialialization in
   pipelines.

   Henry: What's interesting is that it's not information for the next step,
   it's for someone else later.
   ... Infoset annotations are the way to go here.

   Murray: I liked Henry's characterization of the three different kinds of
   parameters.
   ... I'm a pipeline processor, I have a blank mind. Once things get going,
   I start to become aware of stuff.
   ... I want to be able to store that away so that I can use it later.
   ... Some of it is in files that I can assign URIs to, and some of it is in
   memory (which might also have URIs). I can build this little environment.
   ... I might be operating many components and there might be an arbitrary
   number of steps.
   ... Along the way, I'm going to have to calculate things. For example,
   processing a book might require multiple passes to get all the page
   numbers correct.
   ... I might store the infoset for the ToC somewhere, then later when I
   know the page numbers, I might want to edit it.
   ... Then later, I might grab that and actually use it to build a
   PostScript rendering of the ToC.

   <ht> HST observes this connects up with Norm's pool idea, understood as a
   little local filesystem

   Murray: Then later still, I might build an online version of that ToC.

   Norm agrees with HST, but is frightened of mutable infosets.

   <ht> each instance of the pipeline starts with an empty disk, as it were

   Murray: When the job ends, I go back to having a blank mind. Maybe some of
   my stuff is stored, maybe it isn't.
   ... All of these things are resources that are created as we go (or before
   we start).

   <alexmilowski> mid-pipeline binding of parameters is something I do all
   the time...

   <ebruchez> It's called a variable ;-)

   <ht> MT Pipeline does the same thing as XPL here, using no-URI fragments
   to identify such local resources, e.g. #tempDoc

   Alex: I'm with you Murray, I think the problem we're having is with the
   distinction between parameters and infosets.

   Murray: But aren't they all resources?
   ... If I say "-j Alex", if that value needs to be assigned, somehow I have
   to be able to reference that value.

   Alex: I think it's useful to treat the infosets differently, but maybe
   there's room for debate on that. I think there should be simple parameter
   values too.
   ... They're all resources philosophically, but lots of processors have a
   distinction betwen "the input" and parameters that are sent to them.
   ... Look at the Java components that wrap up XSLT for example.

   Norm: The Java/XSLT case might be useful to consider.

   <ebruchez> It's a push vs. pull, in a way.

   <ebruchez> XSLT's source is pulled by the transformer, and the other
   parameters are set in advance.

   <ebruchez> Not sure if that has to hold though.

   <alexmilowski> the parameters do not have to set in advance

   Adjourned

   [End of minutes]

     ----------------------------------------------------------------------

   [1] http://www.w3.org/
   [2] http://www.w3.org/XML/XProc/2006/02/23-agenda.html
   [3] http://www.w3.org/2006/02/23-xproc-irc
   [4] http://www.w3.org/XML/XProc/docs/langreq.html
   [5] http://www.w3.org/XML/XProc/2006/02/23-agenda.html
   [6] http://www.w3.org/XML/XProc/2006/02/16-minutes.html
   [7] http://esw.w3.org/topic/MeetingTaxis
   [8] http://www.w3.org/XML/XProc/2006/02/27-28-agenda.html
   [9] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
   [10] http://dev.w3.org/cvsweb/2002/scribe/

    Minutes formatted by David Booth's scribe.perl[9] version 1.127 (CVS
    log[10])
    $Date: 2006/02/23 17:04:56 $
Received on Thursday, 23 February 2006 17:31:46 UTC