- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Fri, 04 Aug 2006 18:49:08 +0100
- To: public-xml-processing-model-wg <public-xml-processing-model-wg@w3.org>
Minutes 2006-08-04: Friday morning Present: Norm (chair) Murray (host) Jeni (scribe) Henry Mohamed Alex Richard Norm: What do we want to say about pipelines, pipeline libraries, recursive pipelines etc. First: is it reasonable to have a pipeline inside another pipeline? Henry: I would like to, for modularity. It's a choice to package up steps into a named pipeline. Richard: You should be able to do that, but in other programming languages you have multiple functions, but usually do not put functions inside other functions. If you do have functions inside functions, it's usually to give the inner function access to information in the outer function. There's hiding the function from the outside environment... Henry: Hiding isn't a big deal. Richard: If we don't want to access names in the outer pipeline, then you don't need this. Henry: I don't want the inner pipelines to access information from the outside one. In Java, if I'm a novice, I write classes inside other classes. Jeni: But in Java, you have to create another file. In our language, you don't, so why would you want to embed it? Murray: What's the difference whether it's embedded or not? Henry: When I pass the file to another user, it's no longer obvious which pipeline should be run. Murray: Not yet, but we could provide a mechanism to say which one is going to be run. But I was really asking Richard why he cared... Richard: Because in other languages, there are other semantics associated with nesting a function inside another one, to do with accessing information in the outer function. Henry: That's why it doesn't work for me. Suppose my common code used a pipeline parameter. Murray: Why isn't it all in scope when the functions are at the same level? It doesn't matter if our pipeline language works differently from programming languages. Norm (and others): Yes it does. Richard: We also might, in the future, want to provide some semantics to nested pipelines. Here's a Java example: class foo { int a; class bar { int b, c; ... a = b + c; } } 'a' is available in the inner class 'bar', but not from outside. Murray: In the case the function is outside, you have to pass the arguments. If the function is inside, you don't have to pass the argument into the function: it's just there. Henry: I want it to be a software engineering choice. If I have: <pipe> S1 S2 S3 ... </pipe> If I want to package S2 and S3 into a named pipeline. I'm happy to have named inputs and outputs, and to have it encapsulated, but I want to have parameters passed in automatically. I want parameters to be lexically scoped, but not ports. Richard: A choose can use ports from outside: normally ports are lexically scoped as well. Alex: What would be the problem of saying that you have to declare parameters: if you want to pass the parameter, then you should declare it. Henry: I could live with that. But then it doesn't matter whether it's inside or outside. I have my mind on the simple user with a single pipeline element. Richard: Nesting should correspond to scoping. Alex: Nesting with encapsulation makes sense to me: the pipelines are only accessible in the parent pipeline. Norm: It seems odd that it's only one level deep. Henry: I agree with Alex. Alex: But because it's a black box, this doesn't solve the pipeline library problem. Henry: We have that as well. Jeni: We need pipeline libraries, and they do what we need to do, so why make the language more complex by adding this ability? Henry: You can't use that argument, because removing constructs from the language doesn't make it simpler to use the language. Murray: Using nested pipelines makes absolute sense to me. Example: <pipeline> <pipeline name="a"> </pipeline> <pipeline name="b"> <pipeline name="c"> </pipeline> </pipeline> <step>...</step> </pipeline> Can you call 'a' from 'b'? Henry: No. Murray: Then I don't understand. Norm: If 'b' can't call 'a', then my user who wants to modularise something that's common from 'a' and 'b' to 'd', and pulls it out, but can't call it, is completely baffled. Murray: The step asks me to run 'b'. Surely I should be aware of 'a'. That's what makes sense to the naive user. Henry: So named pipelines always get put into the pipeline library. You can run one of those by name. Norm: So now 'a', 'b' and 'c' are all peers and all callable from each other. Alex: The library changes as you go in: you add things to it. When you go inside 'b', 'c' is added to the pipeline scope. Murray: Naive user. That 'c' is inside of 'b', and the only way I can run 'c' is by invoking 'b'. So I can run a pipeline that's inside of me, or outside of me, but no one else can run pipelines that are inside me. Richard: I agree that we can do this, but if we do, we will have to decide a lot of things that are quite complicated, and we should leave it 'til version 2.0. Murray: That's a good reason for not doing this. Alex: Personally, pipeline libraries are useful, but let's leave *them* to version 2.0. Because import is complicated. Henry: I assumed that you'd just specify all your pipeline libraries on the command line. Norm: I think we need to have pipeline libraries. I think Richard's right that pipeline libraries with pipelines all at the same level is sufficient. We might later think it's too much work for naive users. We can always do that later. Richard: I think we should do it later. We shouldn't pre-empt the semantics of nested pipelines, which we might add later. Murray: We have to do the libraries, with some include mechanism. I like the nesting, but I understand Richard's argument that this is too much for us to take it on right now. I don't think we should make the decision now: I think we should include it in the document, say we're uncertain, and then later pull it, unless users come back saying that they really need it. Alex: Don't we have group? Can't we use that? Henry, Norm: It's not the same thing. Murray: Can we call this procedure rather than pipeline? Norm: Let's talk about that later. Murray: How is this nesting thing not like groups? Richard, Norm: Groups get executed when you come across them: they just provide some scope: you can't call them again. Murray: Can we conflate them? Norm: I don't like the idea of asking the public whether we should do something. All we'll ever get from the public is "yes, we should do it". We should give them the minimum, and get them to ask for more. Alex: So can we talk about pipeline libraries? Henry: <pipeline-library> contains zero or more <pipeline> elements. We're done. Jeni: We need defaulting. Henry: <pipeline-library> contains zero or more <pipeline> elements, and a default-pipeline attribute that points to one of them. Norm: Let's get agreement on pipeline libraries. Alex: We shouldn't have default-pipeline. We just supply the QName when we call the pipeline. Norm: If you have to point to the library, then it's no cost to provide the name as well. To review: A pipeline library contains zero or more pipelines, all of which have names. (Zero-or-more or one-or-more...) I don't feel strongly about defaulting. Richard: I want to just refer directly to the library, just like in C, you have a 'main'. A library can have a default pipeline in it, that gets executed if you get given the library. Norm: Java has this functionality. It seems no effort, and has some use. Mohamed: What about including other libraries? Richard: We should use import rather than include. Include implies textual inclusion. With import, the pipeline library might be already compiled, and the only things that are available are some packaged information. Alex: Can you import inside the pipeline library? Richard: Yes, you have to import from the pipeline library. Norm: I suggest we leave off default-pipeline attribute for now. The <import> has a source attribute that points to the imported library. It can go in <pipeline-library> and in <pipeline> Murray: I think pipeline libraries should have a name for debugging purposes, so if I loaded it, debugging information would be raised. Richard: I think it should have a name as well. Norm: OK, optionally have a name. Jeni: We shouldn't allow <import> within <pipeline> Richard: You might have a single <pipeline> element in a file; you should be able to import pipelines into it. Jeni: No: if you need to reuse pipelines, you have to ramp up to having a pipeline library. Mohamed: You should import pipeline by QName rather than URI. Alex: I would be happy with a <import> that excluded the URI, and tell the implementation you need pipelines by name. Richard: So do you expect a catalog mechanism so that I can get libraries by URI when I'm not connected to the 'net? Is this our problem? Norm: This isn't our problem, just as it isn't XSLT's or Schema's problem: it's implementation-defined how the documents are retrieved given a URI. Henry: If I have a pipeline and Richard says he has a library. I thought that I had to say on the command line where the library is, but everyone said that was crazy. So I need an import library statement that I can put in my pipeline. Jeni: You add <pipeline-library> around it and add <import> ...much discussion about the requirement for naive users to add <import> in their standalone <pipeline> skipped... Norm: I'm looking for a compromise. Suppose we go back to the GCC model: you supply the pipeline libraries at the command line. Richard: I think pipelines are going to be little things that they want to run. They don't want to have to do this at the pipeline. I think we should allow <import> within <pipeline> when <pipeline> is a document element. But in a pipeline library, you have to put it at the top leve. Alex: So if I rip out a pipeline from the pipeline library and try to run it, then it would be invalid. Plus if I put a pipeline into a library, I need to move the <import> into the top level of the pipeline library. Murray: What was the logic behind not having the wrapper with a standalone pipeline and putting the <import> inside that wrapper? Norm: Most users are going to have simple pipelines, and they're not going to want to write the wrapper. Alex: If <pipeline> can have <import> inside it, then it should be able to do that within a pipeline library. Jeni: Are the imported pipelines visible within the <pipeline> itself or in the entire library? Alex: Only in the <pipeline> that contains the <import>. Norm: Recap: We will have a <pipeline-library> element that can contain pipelines. It has an optional name. You can import pipelines from another pipeline library. A pipeline can also stand by itself, which can import other pipeline libraries. You can import a standalone pipeline. Jeni: Circularity? Norm: If you import a library that you've already imported, you don't worry: all the pipelines you import are available. Murray: I should be able to have an import in a pipeline in a pipeline library, so I can cut and paste. ... Norm: What about saying that a standalone pipeline can't be imported. We have a syntactic warp in allowing import within a pipeline in one place and not another; this is a way of getting around it. Richard: To go back: if A imports B and C, then C shouldn't be able to access pipelines in B. Alex: In XSLT, you can. Richard: In C you can't. Norm: In XSLT you can. Richard: It means that there are libraries that will work in some contexts but not another. Norm: We can say that if any pipeline library contains a step that references a pipeline that isn't imported then it's an error. Richard: So names are globally scoped. A / \ B C | D A can see things in B and C and D. B can only see things in B. C can see things in C and D. D can only see things in D. Richard: So everything in the libraries that you import gets automatically exported. What about circularity. A <-+ / \ | B C | | | D -+ Henry: Where you start is the top (A). You stop at D. (agreement) Norm: The name for the import statement is <import> with an attribute called 'source' (this is consistent with what we do with <input>). Alex: In pipeline libraries, we also have to deal with declaring components. Norm: Yes, we need to deal with extension components. Alex: We should put it in the pipeline libraries. DECISION: We have pipeline libraries with <pipeline-library> document elements, with an optional name attribute and containing multiple pipelines. We have standalone pipelines with <pipeline> document element. Both can have <import source="URI" />* as children of the document element. This points to either a pipeline library or a standalone pipeline. As well as the built-in components and implementation-defined components, a pipeline library or a standalone pipeline has in scope all the pipelines of all the pipeline libraries or standalone pipelines that it imports, recursively. No consensus on a default pipeline to run within a pipeline library. BREAK Inputs and outputs. Henry: This isn't a proposal for naming, it's an analysis that may help. A component is a named box with named things that data comes into and named things that data comes out of. We have the ability to replicate them, and use things to connect these boxes together. I propose declaring components with: <comp name="xslt"> <inputs> <port name="doc" arity="1" /> <port name="ss" arity="1" /> </inputs> <outputs> <port name="result" arity="1" /> </outputs> </comp> and parameters go in here as well, but this discussion doesn't incorporate parameters. We have something new now, which covers four language constructs: group, for-each/viewport, choose and when. These are all containers for steps, with their own paired in/out at the top and out/in at the bottom. Choose actually looks almost like this, but the things inside are containers as well. <step kind="xslt"> <input name="doc" (source="p!x" | href="http://...") [select="..."] /> </step> This is similar to what we've talked about before, except that source->href and ref->source. So how to do we do the in/out and the out/in for the containers. We have a combination of <port> and <input>: <iface name="x" arity="..." (source="p!x" | href="http://...") [select="..."] /> <oface name="y" arity="..." (@source | @href), @select? /> Richard: What about pipelines? Henry: Pipelines are like components, in that they have some named ports at the top and the bottom. But we can't call them inputs and outputs. The value of the source attribute must always be the name of a Component ! the name of a port on a component or the name of a port on oface. Richard: I think pipelines have all of these things. You need to say what inputs they have, just like for component definitions. And you need to define the inputs for within the pipeline, and you need to bind an input for use within the pipeline. General agreement. Richard: An input for a pipeline doesn't have a source. Henry: It *could*. Richard: But it doesn't *need* it. <iface> got its source from <input>. Jeni makes the point that the out-facing ports may have different names from the in-facing ports within the container. Henry combines them by making them siblings and writes up: <iface|oface> <input @name, (@source | @href), @select? /> <port @name, @arity /> </iface|oface> Henry: I'd like to digest this for a while before we discuss names. Richard objects to the naming of one thing <port> and another thing <input> since an input is a port. We decide to think on the naming for a while. --- Core components --------------- Norm: We've talked about various components like XInclude, validate, XSLT. What are the others? List: XInclude XSLT[1|2] validate* xquery load save identity httprequest aggregate disaggregate subsequence escape (string to XML for RSS) unescape (XML to string for RSS) XPath[1|2]filter wrap wrap-sequence insert (attributes|elements|change values) ns-rename delete (subtrees|attributes) rename (attributes|elements) strip whitespace absolutize (absolutize selected URIs) prettyprint exec os-access (get directory/environment variable etc) sort (sorts elements) regex (destructures a string) bitbucket/sink doc-replace (replaces an input with another one) diff c14n encrypt decrypt sign verify label (adds IDs to all elements) line number push-tag (wrap selected elements with a wrapper) soap-exchange SPARQL manifest/packaging render XSL-FO/SVG/MathML tagsoup wikify sgml-in schema-check apply (pipeline) grddl (returns RDF from XML document) STX (streaming transformation) NVDL (namespace validation) uptranslate downtranslate forward-chain-RDF replicate load-escaping-entity-references save-disable-output-escaping (During the course of generating the list) Henry: I need two versions of load/save/identity, for different arities. Richard: We've agreed that a sequence of one document is acceptable to a port with an arity of 1. Henry: I think we should either declare arities and enforce them statically, but if we're not doing that, then we don't need two versions of load/save/identity. Jeni: We need, for example, load as well as a href/source attribute, to allow the URI to be, for example, passed in as a parameter. ... We have agreement that xml:base processing happens automatically, but we have to talk about what happens in terms of the base URI of outputs. We also need to talk about security at some point. ... Alex: We should have modules of components that vendors may implement. (general agreement) ... Murray: What about entities? Henry explains a case where the entities were escaped on load and unescaped on save. We need to talk about character encodings in the pipeline: we need to provide a way of preserving a character encoding through the components. Murray: I use entities for reuse: I don't want them expanded. Norm: You have to use XInclude or other mechanism. Richard: Nothing else in the XML stack does this. Henry: I want a load-while-escaping-entities step. Richard: We could have a component that turns the DTD into an XML document that can be passed through to a later component, that can then reconstruct the DTD for the entities. We want to come back to preserving entities. ... Henry: I'd like to talk about built-in parameters which have information from the XML declaration. Richard: Encoding and version are in the Infoset already. ... Alex: I want to have some general declarations on serialization parameters. Henry: We should put those on the output port declaration, to give hints to the implementation. ... What about core components? If no one objects, they're included... XInclude XSLT 1.0 validate identity aggregate Alex objects to load because he wants httprequest. Murray objects to all the rest. We decide to take a different tack. BREAK FOR LUNCH
Received on Friday, 4 August 2006 17:49:25 UTC