Minutes for XProc WG telcon of 05 Jan 2006

Draft minutes are now available:

   http://www.w3.org/XML/XProc/2006/01/05-minutes.html

Text copy below:

W3C[1]

                                   - DRAFT -

                 XML Processing Model WG Weekly Teleconference

5 Jan 2006

   Agenda[2]

   See also: IRC log[3]

Attendees

   Present
           Norm, Jeni, Michael, Paul, Erik, Alessandro, Andrew, Alex, Henry,
           Richard, Rui

   Regrets
           Robin

   Chair
           Norm

   Scribe
           Norm

Contents

     * Topics
         1. Accept this agenda
         2. Accept minutes from the previous teleconference
         3. Next meeting: 12 Jan 2006
         4. Tech Plenary registration is now open
         5. Iteration
         6. Requirements
         7. Any other business

     ----------------------------------------------------------------------

   **

   **

   <scribe> Scribe: Norm

   <scribe> ScribeNick: Norm

   Date: 05 Jan 2006

  Accept this agenda

   http://www.w3.org/XML/XProc/2006/01/05-agenda.html[4]

   Accepted.

  Accept minutes from the previous teleconference

   http://www.w3.org/2005/12/22-xproc-minutes.html[5]

   Accepted

  Next meeting: 12 Jan 2006

   Any regrets for next week? None given.

  Tech Plenary registration is now open

   http://www.w3.org/2002/09/wbs/35125/TP2006/[6]

  Iteration

   Any discussion of iteration?

   ht: I came up with a use case this week: using a pipeline to construct a
   soap message, send it off, the result containing multiple instances of
   document which are in turn used to construct soap messages and send them
   out.
   ... This pipeline doesn't produce a new document, it just iterates over
   some results from soap messages.

   richard: This is iteration over multiple subtrees in the same document.
   Last week we were talking about iteration over multiple documents.

   ht: Indeed, but it raises the question of whether we want to think about
   an abstraction over the kinds of things that go through pipelines.

   Alex: ht's example is very much like a lot of things I do. What I've been
   doing is treating the elements that match as their own sub-documents or
   just documents and the steps in the processes just treat them like
   documents that could have come in separately.
   ... And then on the sub-document I run a pipeline over that.
   ... The abstraction that I'm using is a sequence of documents.

   ht: Do we want at this early stage to go from a simple view that says a
   pipeline has a document that goes through it to one that has a sequence of
   documents flowing through it.
   ... harder to explain, to understand, and slightly harder to implement.

   <ht> ht: ... but more powerful

   Erik: the way we do this with XPL, as an example, is to just provide a
   facility for iteration that allows for extraction from a node from a
   document using XPath.
   ... We can select //something, for example, and then iterating over each
   of the elements returned.
   ... this is a solution that doesn't require a sequence of documents.

   Alex: But aren't they really virtually there?

   Erik: Well, yes, but it's iteration over a subset of the pipeline without
   adding a sequence type to the pipeline language.
   ... The concept of sequence in XPL is limited to iteration

   Alex: Both XSLT2 and XQuery support sequences and we are, I assume, going
   to want to support them.
   ... I have another component that can produce an aggregate. I think the
   idea of sequence as a primary thing in the language is very important.
   Architecturally, it lets us have a clear view at the language level of two
   different types of components, one that can process a sequence and another
   that only supports a single document (compare XSLT 2 and XSLT 1). We could
   define the semantics of how you process a sequence with a component that
   only handles a single doc

   ument.

   Richard: You've suggested here that there are some components that
   understand sequences. Then a pipeline controller could know what to do
   with a sequence. This raises the question then of whether you're allowed
   to write components that are allowed to maintain state between the
   documents passed through them.

   Alex: I can see that as a real concern. XQuery does maintain a state over
   the whole sequence. It seems like its component-dependent.
   ... As soon as you have a component that requires the whole document,
   streaming stops. I think the same thing would hold true for XSLT 2 or
   XQuery with respect to sequences.
   ... When I deal with sequences I use the subtree-selection and that's my
   iteration thing for producing a sequence that I then iterate a pipeline
   over.

   richard: There's also the question of what you do with the output of some
   sort of an iteration component. Merging them back into a single doucment
   might be sufficient.

   Alex: Yes, there's a real issue here with how you deal with sequences and
   components that don't know what do with sequences
   ... I deal with sequences and the receiver either knows what to do or it
   HCFs.

   richard: Your point about XSLT having to buffer up the whole document
   seems to me to be a good argument in favor of sequences instead of always
   packaging things into one document.
   ... OTOH, in some cases allowing for streaming within processes may cause
   that problem. Rather than having an explicit sequence, you could have a
   component that takes a packaged up document and runs a process on subtrees
   with in it.

   Alex: one of the things I use all the time is the ability to scope an XSLT
   to process only a subtree.

   <richard> correction: streaming might *solve* the problem

   Alex: The way I deal with this is to say that the baseline is always
   streaming and if you need the whole doucment there's a little adapter that
   lets you build the whole document.
   ... streaming and dealing with subtrees is really critical to me.

   ack, ht

   <Zakim> ht, you wanted to struggle with the difference between iteration
   vs. dataflow

   ht: I just want to mention for future reference that part of the problem
   here arises from different models: one perspective is the programming
   language perspective, an xml scripting language, in which context talking
   about iteration makes great sense. But another perspective is data flow.
   In data flow, there isn't any iteration, there's just data.
   ... They're duals: the dual of iteration in data flow is sequences.
   There's a tradeoff in both implementation and conceptual terms between
   these two ways of thinking about these things

   Alex: We should try to write a declaritive language that doesn't force
   that choice, either should be possible.

   ack

   <Zakim> ht, you wanted to mention the tension between streaming and error
   recovery

   ht: Streaming is great and often what you want to do, but I think it's
   worth noting that once you try to provide gaurantees about error
   protection, there's a tension between knowning that one step has finished
   without error before the next starts.

   Alex: yes, there's definitely tension there. Maybe there are requirements
   here we should articulate.

   richard: We could say that if you want synchronous error handling you have
   to add a component to support that.

  Requirements

   Norm observes that we have a few requirements documents out there

   Alessandro summarizes the requirements that he posted yesterday.

   http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Jan/0002.html[7]

   <Zakim> richard, you wanted to mention the tension between streaming and
   multiple inputs

   richard: If you have streaming inputs then the question of multiple inputs
   becomes somewhat difficult. SAX doesn't really work for multiple inputs if
   you have independent streams.
   ... Another way to do it is to consider one input the principle input and
   the other inputs to be the explicit actions of the processes

   <Zakim> ht, you wanted to be nervous about 'while'

   ht: I think the matter of turing completeness can be overplayed and it's
   hard to come up with useful tools that aren't vulnerable to various DOS
   attacks, but "while" has properties that none of the other constructs have
   in that regard and I've only recently come up with a case where maybe I'd
   like to have it.
   ... I think "while" wants to be in some sort of "maybe" category IMO.

   <Zakim> Norm, you wanted to ask what "for each" means at the pipeline
   level and to ask for more detail about "fallback behavior"

   <Alessandro> Alessandro agrees with ht

   Alex: It's a maybe but it is useful. It would be nice to have a mechanism
   for extending the language that might allow others to implement something
   like "while".

   <richard> can ht and alex give examples of what they mean by "while"

   ht: I think that's right. It's one thing to say that there's a component
   that uses "while", it's entirely another to say that you can change the
   syntax of the language to put a while scope around a component.

   Alex: To my pipleline language, everything is a pipeline step.

   <ht> richard, e.g. run the output of this step/pipeline back in as its
   input while some xpath is satisfied

   Alex: Do we have a language with iteration and conditionals etc. or do we
   just have a collection of components some of which implement those
   features.

   Norm asks Alessandro what he meant by "fallback behavior"

   Alessandro: I was thinking of "try/catch/finally" sort of construct

   <ht> norm, 'catch' is the fallback in a try/catch construction

   <MSM> (except without the ability to throw)

   <alexmilowski> Example of while: translate an ATOM entry feed with "next"
   elements into one large atom feed

   ht describes his "while" example from IRC above.

   richard: In languages like C and Java, while and for are just minor
   syntactic variants. Here they're quite different.

   <alexmilowski> The next element points to another chunk of the feed. The
   result of following the feed next element may be another feed with another
   next element.

   ht: Right. We've been talking about for-each not a general purpose for.

   richard: Alex you were suggesting that all the control structures could be
   components, is that right?

   Alex: Yes

   richard: Right. So that provides completely general extensibility in
   control strucutre but has the downside of having the control structure
   opaque in the sense that it uses any non-standard control structures. It's
   harder to build tools to display the structure.

   Alex: My pipeline language itself is really defined by the components. A
   component has an element, the element has a syntax and may specify a
   subpipeline that it runs
   ... But it does cause problems for authoring tools.

   <Zakim> ebruchez, you wanted to mention that extensibility is good, but
   core syntax has to remain clear

   Erik: I just wanted to say that it would be nice if it's extensible, but
   we need to have a clear core syntax.
   ... I don't have any particular opinion at this moment, but we should try
   to keep the syntax simple for the core features.

   Alex: One example of a language that's proved to be very exensible is Ant.
   It has a core set of concepts, but everything else is an extension.

   Erik: Yes, I understand that. If everything is working with infosets then
   that might be very possible.

   Norm suggests that we begin writing a requirements documents

   Alex: concurs

   Norm asks for an editor

   Alex: I'm willing.

   <Jeni> Rui also wrote some requirements at
   http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Jan/0000.html[8]

   Norm agrees to help Alex get started up

   ht: Is there anyone who can explain NVDL

   Norm: NVDL sent us an explicit list of requirements.

   Its task is validation of mixed-namespace documents, by taking your
   document and producing a lot of different subtrees

   an HTML + MathML document may turn / will turn into an HTML document and a
   MathML fragment.

   The HTML goes to an HTML validator, the MathML goes to a mathML validator.

   <ht> HST notes DTD validation adds attributes too. . .

   And at the end, they want to be able to stitch things together again at
   the end.

   alexmilowski: pipelines can certainly do that, but what a crazy way to do
   it.

   Norm: you are able to insert stub elements, to prevent the equation
   element in the parent document from being empty.

   [Docbook example]

   <ht> HST believes NVDL also gives you the ability to specify, for each
   fragment, what static resources, e.g. DTDs, Schemas, to use

   alexmilowski: why not just ... ?

   Norm: the idea is to allow this to work without your having to change the
   individual schemas.

   alexmilowski: an interesting use case: what if you have multiple
   validators for the equation, and they say different things?

   alexmilowski: even if I wouldn't do validation that way, it's still a
   useful and interesting use case.

   <ht> HST hopes the minutes will show Alex's suggestion about infoset
   annotation in general, and PSVI validity annotation in particular

   <Zakim> ebruchez, you wanted to quickly talk about NVDL

   Erik: NVDL has some pretty well defined use cases to explain why they do
   it that way

   <Zakim> richard, you wanted to mention issues of dividing and recomposing
   a document in complicated ways

   <ht> The PSVI issue is, roughly, the question of _what_ flows through the
   pipeline -- serialised XML documents, 'vanilla' XML infosets, arbitrary
   infosets, . . .

   richard: I wanted to contrast this problem with the simpler case of
   passing small parts of a document through some othe rprocess and then
   reassmbling them.
   ... In this case the depth of nesting of structure is arbitrary. So either
   you need a component that can decompose a document this way or you need
   the flow of the pipeline to be controlled by the hierarchy of the
   document.
   ... That's a much more dynamic kind of pipeline than we've considered
   before.

   <ht> Concretely, will the pipeline language support some standard way for
   a pipeline step which follows an XML Schema validation step to access the
   [validity] and [validation attempted] properties of items in the infoset.
   . .

  Any other business

   None.

   ADJOURNED

     ----------------------------------------------------------------------

   [1] http://www.w3.org/
   [2] http://www.w3.org/XML/XProc/2006/01/05-agenda.html
   [3] http://www.w3.org/2006/01/05-xproc-irc
   [4] http://www.w3.org/XML/XProc/2006/01/05-agenda.html
   [5] http://www.w3.org/2005/12/22-xproc-minutes.html
   [6] http://www.w3.org/2002/09/wbs/35125/TP2006/
   [7] http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Jan/0002.html
   [8] http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Jan/0000.html
   [9] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
   [10] http://dev.w3.org/cvsweb/2002/scribe/

    Minutes formatted by David Booth's scribe.perl[9] version 1.127 (CVS
    log[10])
    $Date: 2006/01/05 17:05:45 $

Received on Thursday, 5 January 2006 17:16:44 UTC