XProc Minutes 26 Apr 2012 from Norman Walsh on 2012-04-26 (public-xml-processing-model-wg@w3.org from April 2012)

From: Norman Walsh <ndw@nwalsh.com>
Date: Thu, 26 Apr 2012 11:12:03 -0400
To: public-xml-processing-model-wg@w3.org
Message-ID: <m2aa1y5xz0.fsf@nwalsh.com>
See http://www.w3.org/XML/XProc/2012/04/26-minutes

[1]W3C

                                   - DRAFT -

                            XML Processing Model WG

Meeting 214, 26 Apr 2012

   [2]Agenda

   See also: [3]IRC log

Attendees

   Present
           Norm, Henry, Vojtech, Murray, Alex

   Regrets
           Jim

   Chair
           Norm

   Scribe
           Norm

Contents

     * [4]Topics

         1. [5]Accept this agenda?
         2. [6]Accept minutes from the previous meeting?
         3. [7]Next meeting: telcon, 10 May 2012, skip 3 May
         4. [8]Review of open action items
         5. [9]p:zip and p:unzip
         6. [10]Debugging strategies
         7. [11]Clustering
         8. [12]Streaming and parallel processing

     * [13]Summary of Action Items

   --------------------------------------------------------------------------

  Accept this agenda?

   -> [14]http://www.w3.org/XML/XProc/2012/04/26-agenda

   Accepted.

  Accept minutes from the previous meeting?

   -> [15]http://www.w3.org/XML/XProc/2012/04/19-minutes

   Accepted.

  Next meeting: telcon, 10 May 2012, skip 3 May

   Accepted.

  Review of open action items

   A-213-10: Completed

   A-213-11 to A-213-14: Completed

   A-213-15: Completed.

   A-213-15 - A-213-18: Completed.

   Vojtech: For XSLT 1.0, the only option is to write to a file, we're
   explicit about not having documents appear on the secondary output port.

   A-213-09: Completed.

  p:zip and p:unzip

   Norm: I move we postpone this until Jim can be present.

   Accepted.

  Debugging strategies

   Murray: I'm trying to figure out two things: what sorts of mechanisms
   would be useful in the language to assist with debugging, and what sort of
   things are already there?
   ... There's p:log, two implementations have a "message" step, but I'm
   wondering about the possibility of other kinds of steps.
   ... I've had this discussion in the past with C programmers and now I'm
   talking with XProc programmers.
   ... I put some steps in the requirements document: one to turn on
   debugging, one to turn on tracing. I highlighted that there are some
   functions that can give you information about your environment.
   ... I wonder about strategies ... logs, etc.
   ... It seems like those are the sorts of things you might expose.

   Norm: There are two things you can do: get a dump of the graph and get
   more verbose logging.

   Norm waxes poetic about -D and Java logging.

   Vojtech: We have something similar. We have profiling output. And also we
   have a detailed trace of the pipeline: what documents were passed, what
   were the options and variables, etc.
   ... I wonder if we should try to standardize this.
   ... In other specifications by XQuery, there's a trace function but the
   rest is implementation defined or dependent.
   ... In my view, we have p:log which is rather inflexible and you can have
   a message step. But the problem with this is that it requires you to
   modify the pipeline and potentially break the sequence of steps. Sometimes
   you have to do ugly plumbing to keep the original sequence.
   ... Maybe what we could consider is some sort of construct like group or a
   wrapper that would log some information without having to add pipe
   bindings to keep the pipeline in the original sequence. A construct that
   doesn't influence the connections between the steps would be nice.
   ... Instead of a message step. Or we could have both. We could have a
   trace element that wraps a bunch of steps and does logging, but it
   wouldn't be a step.

   Norm: Yes, we could invent a new kind of thing, but I wonder if this is so
   implementation dependent that it's of limited value.

   Vojtech: Like p:log, I think we could leave the details implementation
   dependent. The trace wrapper might be something like the resource manager
   that we discussed in the past.

   Norm: Yes, if we invent a new kind of wrapper for the resource manager,
   maybe we could leverage that for the trace wrapper.

   Henry: Oh, I'd rather not. You really shouldn't have to edit your pipeline
   to do this.
   ... Maybe a wrapper is the best we're going to come up with. For something
   like the resource manager, a wrapper is more appealing because it's a
   feature of the design of the pipeline. Whereas, tracing and profiling are
   not part of the pipeline.
   ... So I'd rather not have something in the pipeline.
   ... We can't just leave this to implementors, the way the python or lisp
   debuggers do, because you can't implement XProc in XProc.
   ... A different way to talk about this in the same spirit would be to say
   that we already have ways to name things. Maybe we want to think about
   this in a sort of meta way: we want to think about ways of annotating
   pipelines, externally even, in order to describe tracing or profiling
   behavior.
   ... We could have a trace descriptor and a pipeline.

   Alex: This could be done if you had a description of the binding for the
   pipeline.
   ... That would require the ability to point at a chunk of pipeline not
   individual steps.

   Vojtech: We could do somethign similar to XQuery 3 with annotations.

   Norm: It seems like some sort of "trace only these named steps" feature
   might be useful.

   Henry: I know that there was at least some work in actually doing just
   what I dismissed: as far as the engine is concerned all you can say is
   instrument yourself. Where you put the enegy is in the tool that presents
   the output to you. So instead of trying to say only give me trace
   information for the last four steps, you just turn on tracing.
   ... Then the tool only shows you the output for only the last four steps.

   Norm: Yeah. Fair point. My tracing is all adhoc.

   Murray: So we could imagine an XProc pipeline that read the trace output
   and presented it in a nice way.
   ... I've heard the argument before for putting all the tracing outside the
   program. I've heard the same argument about documentation too.
   ... One of the things I've noticed as I'm gathering these requirements is
   a section called "Integration".
   ... A lot of these requirements in the areas of debugging and testing and
   error handling are related to integration. All of these things can be
   aided by leaving sign posts in your program. If you know that you're
   having a problem in a certain area of the program, then leaving the
   indicators in there and being able to flag the pipeline could be very
   helpful.

   <ht> Hmm -- I absolutely agree that documentation is an integral part of a
   program or pipeline

   Murray: You can run your pipeline 24 hours a day and diff the traces, look
   for differences, etc. This just seems useful from a Q/A audit perspective.

   Alex: My question is, can I write a pipeline that's normal and reasonably
   minimal and still debug the thing?
   ... Could I profile, debug, etc. without having to touch the pipeline?

   Norm: I think with an appropriate debugging environment you could.

   Vojtech: Yes, but some steps are in libraries that can have the same
   names, etc.

   Murray: I don't care what anybody does with respect to designing a
   debugger that can look into an XProc program and debug it. More power to
   them. But that's not what I want to discuss. We're talking about
   requirements for the language.

   Henry: I hear you, the way I hear this conversation going so far is that
   no body has come up with any.

   Murray: No. Several people have made suggestions, but we keep coming back
   to "I want to do this from outside my pipeline"

   Henry: Putting things in the language requires that implementors support
   them. I think the argument that I would make isn't that my program is
   sacred, but rather are we sure enough of the value of in-language support
   that we want to require everyone to do the work that's necessary.
   ... It's the cost-benefit analysis that comes first.

   Murray: Here's a simple question: if a processor has the ability to turn
   on trace, then providing some markup that advises that processor that this
   is a good time to turn on trace, would be useful. And if the processor
   can't turn on trace, then it's harmless.
   ... I don't want to specify what comes out in the trace, though we might
   want to give some advice, but that's up to the processor.

   Alex: I guess the conundrum as I see it is that we don't have any
   debuggers yet. And we have very minimal tracing and debugging support.
   ... I suspect there are things we should do but I don't think I know what
   they are.

   Murray: Well, Norm said he output trace information...

   Alex: Yes, but that's very primitive compared to other languages. Do we
   have the right naming conventions, for example?

   Murray: We decided, early on, that there would be a "stderr" port. Could
   we not designate a port for trace output?
   ... I just want to look for some things that would make the language
   easier to debug.

   Vojtech: We already have p:log, but it's very primitive. Maybe we should
   just make p:log more flexible and useful; allowing it in options,
   variables, input ports, etc. Then with a processor switch, you could
   enable the log statements you wanted to trace.
   ... It could wind up in one location. Maybe we don't have to add anything
   new, just improve existing features? Maybe we could imagine a switch to
   magically insert p:log statements everywhere. The advantage of the log is
   that it doesn't change the sequencing of steps.

   Norm: We could do that. The only thing that occurs most obviously to me
   would be a standard message step.

   Vojtech: It's definitely useful, but it's tedious to add 10 of them.

   Norm: Yes, it's tricky, but is still perhaps useful enough to standarize.

   Vojtech: Maybe with a switch to disable the output.

   Henry: Yes, I think that might be worth looking at standardizing that.
   Maybe we could add classes so that you can enable them or disable them by
   name. It would be nice to be able to turn them off without having to edit
   them out.

   Alex: I'm looking at p:log. First a question: If I don't have an href or
   if I use the same href, what happens?

   Implementors mumble a bit

   Alex: It would be nice if there was some metadata on the output so that I
   could reconstruct what happened later. A notion of what port this was
   produced from, when it arrived, etc.
   ... Similarly, it might be nice to log inputs.

   Vojtech: Absolutely.

   Alex: It would be nice to be able to put assertions inside the p:log step.
   ... Is this XPath expression true?

   Vojtech: The ability to construct a message with an XPath expression would
   be useful.

   Alex: Those are the sorts of things that would be useful.
   ... You could have one big log file with all the data in it; then you
   could examine that output.

   Murray: So one of the things we could consider is whether every step would
   have a verbosity level and basically if you had high verbosity turned on,
   then that step would report some things when it started.
   ... We could rationally talk about what those conditions might be.
   ... Speaking of which, I've listed a lot of functions in the use cases and
   requirements document. It might be nice to have an exhaustive list.

   Norm: Where's the list?

   Murray: F.5.12

   Norm: That's a mixture so I'm confused.

   Murray: Yes, it's a mixture, but they return information about the current
   context or environment.
   ... All of this is useful information that you can use in debugging. Years
   ago, working in troff, I got some debugging built in. We had levels of
   verbosity and I could set the warning/error etc. messages. I could print
   messages at the beginnings of loops, I could turn trace on in the middle
   of a loop, etc.
   ... I found this useful at the time.

   Norm: Yes. I can see that.
   ... Of the things we've discussed today, I think the proposal to extend
   p:log so that it can contain messages or assertions and the ability to log
   inputs seems like the best combination of utillity and low hanging fruit.

   <scribe> ACTION: Norm to sketch out an extension to p:log with messages
   and assertions. [recorded in
   [16]http://www.w3.org/2012/04/26-xproc-minutes.html#action01]

  Clustering

   Murray: Who's baby is clustering?

   Norm: What do you mean by clustering?

   Murray: Good question. I found an input along the lines of "does XProc
   need clustering?"

   Norm: In the doc?

   Murray: Yes, F.3.3

   Henry: Is this group-by?

   Some discussion of where the requirement came from and what it means

   <Vojtech>
   [17]http://www.w3.org/wiki/index.php?title=Integration&diff=55046&oldid=55034

  Streaming and parallel processing

   Murray: Alex and I have noted some language along these lines in the first
   requirements and use cases document that didn't make it into the spec.
   ... But it's never clear what streaming and parallel processing mean in
   concrete terms.
   ... How have we impeded or assisted parallel processing?

   Henry: Parallel processing is a little easier. What I think we meant is to
   never constrain parallelization

   Henry: Make no assumptions about evaluation order that aren't required by
   explicit connectiviiy.

   Henry: The way I used to say it was: it ought to be possible to implement
   an XProc processor by starting each step in a thread an waiting to see
   what happens. Someone has input, everyone else is blocked, and each step
   works as input arrives.
   ... For example, there's nothing today that says that the steps at the
   bottom of a pipeline have to run after the ones at the top.

   Murray: for-each says the step must produce output in the right order.
   Does that have an impact on parallelism?

   Norm: On streaming more than parallel processing.

   Alex: It might be nice to add annotations to a pipeline to say what the
   streaming/parllelism expectations are.

   Murray: I was puzzled by a request to allow for-each in an unordered way

   Henry: Yes, this connects up to unordered collections. Right now we have
   sequences, but if we had collections, then you could have a switch on
   p:for-each that said it was allowed to be unordered.
   ... Then the question is, what does a step that takes an unordered
   collection as input look like?

   <scribe> ACTION: Norm to put streaming/parallel processing on the agenda
   for two weeks [recorded in
   [18]http://www.w3.org/2012/04/26-xproc-minutes.html#action02]

   Norm: Adjourned

Summary of Action Items

   [NEW] ACTION: Norm to put streaming/parallel processing on the agenda for
   two weeks [recorded in
   [19]http://www.w3.org/2012/04/26-xproc-minutes.html#action02]
   [NEW] ACTION: Norm to sketch out an extension to p:log with messages and
   assertions. [recorded in
   [20]http://www.w3.org/2012/04/26-xproc-minutes.html#action01]

   [End of minutes]

   --------------------------------------------------------------------------

    Minutes formatted by David Booth's [21]scribe.perl version 1.136 ([22]CVS
    log)
    $Date: 2012/04/26 15:08:47 $

References

   1. http://www.w3.org/
   2. http://www.w3.org/XML/XProc/2012/04/26-agenda
   3. http://www.w3.org/2012/04/26-xproc-irc
   4. http://www.w3.org/XML/XProc/2012/04/26-minutes#agenda
   5. http://www.w3.org/XML/XProc/2012/04/26-minutes#item01
   6. http://www.w3.org/XML/XProc/2012/04/26-minutes#item02
   7. http://www.w3.org/XML/XProc/2012/04/26-minutes#item03
   8. http://www.w3.org/XML/XProc/2012/04/26-minutes#item04
   9. http://www.w3.org/XML/XProc/2012/04/26-minutes#item05
  10. http://www.w3.org/XML/XProc/2012/04/26-minutes#item06
  11. http://www.w3.org/XML/XProc/2012/04/26-minutes#item07
  12. http://www.w3.org/XML/XProc/2012/04/26-minutes#item08
  13. http://www.w3.org/XML/XProc/2012/04/26-minutes#ActionSummary
  14. http://www.w3.org/XML/XProc/2012/04/26-agenda
  15. http://www.w3.org/XML/XProc/2012/04/19-minutes
  16. http://www.w3.org/2012/04/26-xproc-minutes.html#action01
  17. http://www.w3.org/wiki/index.php?title=Integration&diff=55046&oldid=55034
  18. http://www.w3.org/2012/04/26-xproc-minutes.html#action02
  19. http://www.w3.org/2012/04/26-xproc-minutes.html#action02
  20. http://www.w3.org/2012/04/26-xproc-minutes.html#action01
  21. http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
  22. http://dev.w3.org/cvsweb/2002/scribe/
Received on Thursday, 26 April 2012 15:12:39 UTC