- From: Alex Milowski <alex@milowski.org>
- Date: Tue, 28 Feb 2006 08:03:12 -0800
- To: public-xml-processing-model-wg@w3.org
XPL Presentation (See presentation) * Michael: (Clarification) The 'infosetref' attribute represents the binding and the names are internal to the component The 'name' attribute is the formal parameter name. * Eric: the p:input and p:output declare the name of the inputs and outputs that are used to invoke the process and handle the results * Norm: (Clarification) It is the pipeline processor that looks at the inputs and outputs? * Eric: the inputs and outputs are evaluated in a lazy fashion and it back-chains through the steps which eventually leads to the input of the pipeline. * Alex: (Claification) How does back chaining work with conditionals? * Eric: The output of conditionals needs to have the same infoset name. * Eric: XHTML example (use case 5.15: Content-Dependent Transformations) - one of the use cases. - one of the steps rewrites the QNames for presentation in IE - one of the steps deals with HTML serialization - the output for serialization uses an internal root element node for representation of text and binary (character encoded) * Eric: Iteration example: - lets you iteration over an document via xpath expression - the current() function gives you the current item being iterated - gives you the ability to process large XML document * Murray: Does each of the steps have its own XML vocabulary (e.g. HTTP serializer) * Eric: Yes. * Richard: Do they require their own namespaces * Eric: No, but there it isn't required as it is contextual to the component. Having another namespace adds declarations to the document. GUI Tool Sub-thread: * Richard: Do you have a GUI tool? * Eric: No. * Richard: we should define the tool in terms of a graph * Norm & Michael expressed concern with this as they wouldn't want to require a GUI tool. That starting with a graph could ignore the XML representation Norm's SXPipe: * http://norman.walsh.name/2004/06/20/sxpipe * Stages are executed in order. It is handed a DOM and returns a DOM. * In example, skip attribute allows steps to be skipped. If statically evaluated to true, the step isn't executed. * Impl: two methods: init & run. Init is passed the element that represents the stages. 1700 lines of java. (Alex's presentation here) Richard's presentation: * I want to replace what we do today without a pipeline with an XML pipeline. * lxgrep - produces a tree fragment (multiple root elements possible) via an XPath * lxprintf - formats Xpath matches as plain text -e element For each element * lxreplace - replaces elements/attributes -n Renames an element * lxsort - sorts elements by values identified by an XPath * lxviewport - runs a unix command on everything that matches an element (like subtree in smallx, viewport in MT pipelines) * lxtransduce - ?? * want to make these pipelines more declarative so people can use them without writing code. * XSLT is also available Rui Lupis: (see presentation) * APP: Architecture for XML Processing * Complex processing support for digital librarys - both developers and producers * Always a need for some manual purposes. * Tiers: a set of pipelines woing on disjoint inputs * Pipeline: acyclic diagraph of processors * Processor: defined by a URI that differentiates an interface vs implementation vs usage. * Processing language: Project: an RDF document Pipeline: mapped to a linear sequence of components Registry: An RDF document that registers components & their inputs and outputs * Pros: * Separation of concerns lets you interchange components without touching the pipelines. * Its an implementatin neutral language * and others * Cons: * No interation/test * RDF based * Doesn't support generation of XSLT styelsheets * Doesn't support chunking * Thoughts: * Good to have multiple levels of composition (not just xinclude) * Indirection is good for batch processing Alex: The model is that you define a particular step in the registry that is a binding, for example, of an XSLT transform to its input+parameters to its output. A pipeline then points to that step and the step can be re-used in other pipelines. * If the registry changes, the pipeline doesn't have to change. Infosets: Murray: * stdin & stdout * then there is parameters * then there is the notion of input & output * then there is the notion of an infoset on the side * then there is the notion of artifacts * e.g. on a server you might want to store things in a cache Norm: * storing on a filesystem can be abstract to the idea that outputs have a URI and a processor can decide to write them out to disk if they want. Whether that happens isn't a relevant problem. Richard: * It is quite likely an implementation will need to buffer things if you have a pipeline that isn't just a straight line. Eric * In XPL everything is in scope Richard: * there is no guarantee that you read things at the same rate, so you have to buffer Murray: There's stiff an output being buffered & cached. As an output you produce foo.infoset and later you consume foo.infoset, then you need to store that. Eric: you could have a implemention that buffers things to memory or alternatively to a disk cache if it is too big Murray: Before today, I was thinking this was like a unix pipe. They could be bringing in separate things, but there is still just a pipeline. Most things talked about today don't seem like pipelines. Richard: My stuff is a unix a pipeline.... but that's "just an implementation hack" that uses shell programming. Eric: The reason you want to serialize is? Richard: Because I have a bunch of programs that run on files. I want a language that I can still compile to scripts that serialize to files. There are other things that things like schema validation might do that may not be able to be serialized MSM: It is possible to define a non-standard PSVI serialization Eric: You can always do this by wrapping components that always serialize Norm: * there are simple components where one documen comes in and one goes out * there are other ways to thing about things like XSLT: - there is one input and an ancillary input (the stylesheet) and one output - but this isn't always fixed Alex: Having a primary input is necessary for streaming implemenations. Murray: In what case is that there is the stylesheet the input Norm: I have a report that is coming out and the report is always the same (the input document), but the XSLT is what is generated by the pipeline. MSM: Why is there emphasis on backward chaining? Eric: (diagram on chart w/ parallel steps that start from the same start and are aggregated at the end) Back chaining is because a step can optionally decide not to get an input. It isn't that easy to understand from a user. Specifying order is natural and is a problem. Users do have problems with [controlling] order You have this problem with XSLT Richard: what drives things in XSLT is apply-templates--and that is not backward chaining. parallel paths are the 1% case Alex: There is a whole body of knowledge that deals with network flows and we should be in compliance with those known concepts and algorithms. All: [to alex] You're going to have to prove that you need stdin for optimization. -- --Alex Milowski
Received on Tuesday, 28 February 2006 15:48:52 UTC