- From: Alex Milowski <alex@milowski.org>
- Date: Tue, 28 Feb 2006 08:03:12 -0800
- To: public-xml-processing-model-wg@w3.org
XPL Presentation (See presentation)
* Michael: (Clarification)
The 'infosetref' attribute represents the binding and the
names
are internal to the component
The 'name' attribute is the formal parameter name.
* Eric: the p:input and p:output declare the name of the inputs and
outputs that are used to invoke the process and handle the results
* Norm: (Clarification)
It is the pipeline processor that looks at the inputs and outputs?
* Eric: the inputs and outputs are evaluated in a lazy fashion and it
back-chains through the steps which eventually leads to the input of
the pipeline.
* Alex: (Claification)
How does back chaining work with conditionals?
* Eric: The output of conditionals needs to have the same infoset name.
* Eric: XHTML example (use case 5.15: Content-Dependent
Transformations)
- one of the use cases.
- one of the steps rewrites the QNames for presentation in IE
- one of the steps deals with HTML serialization
- the output for serialization uses an internal root element node
for representation of text and binary (character encoded)
* Eric: Iteration example:
- lets you iteration over an document via xpath expression
- the current() function gives you the current item being
iterated
- gives you the ability to process large XML document
* Murray: Does each of the steps have its own XML vocabulary (e.g. HTTP
serializer)
* Eric: Yes.
* Richard: Do they require their own namespaces
* Eric: No, but there it isn't required as it is contextual to the
component. Having another namespace adds declarations to the
document.
GUI Tool Sub-thread:
* Richard: Do you have a GUI tool?
* Eric: No.
* Richard: we should define the tool in terms of a graph
* Norm & Michael expressed concern with this as they wouldn't
want to require a GUI tool. That starting with a graph could
ignore the XML representation
Norm's SXPipe:
* http://norman.walsh.name/2004/06/20/sxpipe
* Stages are executed in order. It is handed a DOM and returns a
DOM.
* In example, skip attribute allows steps to be skipped. If statically
evaluated to true, the step isn't executed.
* Impl: two methods: init & run. Init is passed the element that
represents the stages. 1700 lines of java.
(Alex's presentation here)
Richard's presentation:
* I want to replace what we do today without a pipeline with an XML
pipeline.
* lxgrep - produces a tree fragment (multiple root elements possible)
via an XPath
* lxprintf - formats Xpath matches as plain text
-e element For each element
* lxreplace - replaces elements/attributes
-n Renames an element
* lxsort - sorts elements by values identified by an XPath
* lxviewport - runs a unix command on everything that matches an
element (like subtree in smallx, viewport in MT pipelines)
* lxtransduce - ??
* want to make these pipelines more declarative so people can use them
without writing code.
* XSLT is also available
Rui Lupis: (see presentation)
* APP: Architecture for XML Processing
* Complex processing support for digital librarys - both developers and
producers
* Always a need for some manual purposes.
* Tiers: a set of pipelines woing on disjoint inputs
* Pipeline: acyclic diagraph of processors
* Processor: defined by a URI that differentiates an interface vs
implementation vs usage.
* Processing language:
Project: an RDF document
Pipeline: mapped to a linear sequence of components
Registry: An RDF document that registers components & their inputs
and outputs
* Pros:
* Separation of concerns lets you interchange components without
touching the pipelines.
* Its an implementatin neutral language
* and others
* Cons:
* No interation/test
* RDF based
* Doesn't support generation of XSLT styelsheets
* Doesn't support chunking
* Thoughts:
* Good to have multiple levels of composition (not just xinclude)
* Indirection is good for batch processing
Alex: The model is that you define a particular step in the registry
that is a binding, for example, of an XSLT transform to its
input+parameters
to its output. A pipeline then points to that step and the step
can be re-used in other pipelines.
* If the registry changes, the pipeline doesn't have to change.
Infosets:
Murray:
* stdin & stdout
* then there is parameters
* then there is the notion of input & output
* then there is the notion of an infoset on the side
* then there is the notion of artifacts
* e.g. on a server you might want to store things in a cache
Norm:
* storing on a filesystem can be abstract to the idea that outputs
have a URI and a processor can decide to write them out to disk
if they want. Whether that happens isn't a relevant problem.
Richard:
* It is quite likely an implementation will need to buffer things
if you have a pipeline that isn't just a straight line.
Eric
* In XPL everything is in scope
Richard:
* there is no guarantee that you read things at the same rate, so
you have to buffer
Murray:
There's stiff an output being buffered & cached. As an output
you produce foo.infoset and later you consume foo.infoset, then you
need to store that.
Eric: you could have a implemention that buffers things to memory or
alternatively to a disk cache if it is too big
Murray:
Before today, I was thinking this was like a unix pipe.
They could be bringing in separate things, but there is still just a
pipeline.
Most things talked about today don't seem like pipelines.
Richard:
My stuff is a unix a pipeline.... but that's "just an implementation
hack" that uses shell programming.
Eric: The reason you want to serialize is?
Richard: Because I have a bunch of programs that run on files. I want
a language that I can still compile to scripts that serialize to
files.
There are other things that things like schema validation might do
that may not be able to be serialized
MSM: It is possible to define a non-standard PSVI serialization
Eric: You can always do this by wrapping components that always
serialize
Norm:
* there are simple components where one documen comes in and one
goes out
* there are other ways to thing about things like XSLT:
- there is one input and an ancillary input (the stylesheet) and
one output
- but this isn't always fixed
Alex: Having a primary input is necessary for streaming
implemenations.
Murray: In what case is that there is the stylesheet the input
Norm: I have a report that is coming out and the report is always the
same (the input document), but the XSLT is what is generated by
the pipeline.
MSM: Why is there emphasis on backward chaining?
Eric: (diagram on chart w/ parallel steps that start from the same
start and are aggregated at the end)
Back chaining is because a step can optionally decide not to get an
input. It isn't that easy to understand from a user.
Specifying order is natural and is a problem. Users do have
problems with [controlling] order You have this problem with XSLT
Richard: what drives things in XSLT is apply-templates--and that is
not backward chaining.
parallel paths are the 1% case
Alex: There is a whole body of knowledge that deals with network flows
and we should be in compliance with those known concepts and
algorithms.
All: [to alex] You're going to have to prove that you need stdin for
optimization.
--
--Alex Milowski
Received on Tuesday, 28 February 2006 15:48:52 UTC