- From: Alex Milowski <alex@milowski.org>
- Date: Sun, 14 May 2006 14:47:19 -0700
- To: public-xml-processing-model-wg@w3.org
While XPath 2.0 brings up issues around XML Schema, I think the problem of sets of schemata is a general issue for any choice of schema language and their use in XML Pipelines. When I added schema processing steps to smallx, the first thing I needed was a way to define a set of namespace to resource mappings. You can't rely on the xsi:schemaLocation et. al. attributes and some people, myself included, think of that as huge hacks that we'd want to avoid. I think the issue of typing comes down to three general problem areas: P1. Defining a set of compatible schemata that represent the "known universe" of elements, attributes, and types. Overlapping sets need to be defined so that different kinds of validation can be performed for the same namespace names. P2. Use of the sets defined in (1) within a component. P3. Inter-step typing and type comparison. We can start with a simple assumption that there is some kind of infoset annotation (e.g. a PSVI) that can be passed between components that holds the additional type information. This could manifest itself as an XPath 2.0 data model instance or some other infoset-based API. This somewhat addresses (P3). The remaining problem with (P3) is that how are we assured that a particular type name maps to the same type definition? That is, if two different steps in the pipelines use schemata that use the same names but different types or type definitions, what happens to an XPath in the pipeline that uses that type? I think there is a couple simple notions that will help us here: N1. Notional equivalence by type name. N2. Previous steps that produce the annotated infoset are determined by the pipeline author. There are two places where we need to be concerned about typing [1]: * matching by type (i.e. "instance of" expressions) * comparing simple typed values In the former case, instance-of takes a QName value. This is where (N1) can help us. We could say, regardless of type identity, if the name is the same it is the same type. In the latter case, we'll get a type error if the value from the infoset isn't annotated with the right simple type. As such, the user of the pipeline can: * guarantee the correct schema is used by the previous step that produced the input on which the expression is to be applied. * use simple type constructors in the XPath 2.0 expression I think if we accept both (N1) and (N2), we have a reasonable story around typing that says that we stop at type names. If you need to guarantee that everything uses the same type definitions, you do that by, well, using the same type definitions in every step that uses schemata. What we gain is that we can have pipelines that have different definitions for the same target namespaces. That can be very useful when you know you want to loosen or tighten constraints before or after different steps in the pipeline. In the end, you can use type selection if you make sure that you aren't mis-typing simple-typed values that you need to compare. I don't see that as onerous for the pipeline author. The last remaining issue is around the sets of schemata that you might want to use within a pipeline. If you want to validate at different steps within the pipeline with different sets of schemata for the same namespaces, you need some way to control the namespace name to resource mapping. Here I think we have two choices: 1. Revive the concept of a resource manager. 2. Create a specialized construct for namespace name to resource mappings. I think (2) could be useful for other components that aren't schemata (e.g. business rules/constraint languages). When I added the validate step in smallx, I had a directed syntax for the step and so I just added the mapping for (2) into the step syntax (see [2]). I think we need a more general approach than this. One idea is to just have a definition of namespace name to resource mapping available to any component. It could be considered a static document resource that is just another input to the component that resides within the pipeline. For example (ignoring the choice of namespace): <resource-map id='schema-set-1'> <map uri='http://www.example.com/Vocabulary/MyStuff/2006/1/0' href='mystuff.xsd'/> <map uri='http://www.w3.org/1999/xhtml' href='xhtml.xsd'/> <map href='default.xsd'/> </resource-map> where the last element maps the no-namespace uri to some resource. While we could use OASIS XML Catalogs [3], those catalogs don't handle mapping the no namespace name to a resource. If they could fix that, I'd be happy to use that XML over the above. This definition would be embedded in the pipeline as some resource/document that a step could reference. It could even be an external document to the pipeline. Solving (P1) and (P2) amounts to telling the "schema validate" step which of these resource maps to use. [1] http://www.w3.org/TR/xpath20/ [2] https://smallx.dev.java.net/pipeline-spec.html#section-d0e890 [3] http://www.oasis-open.org/committees/entity/spec-2001-08-06.html --Alex Milowski
Received on Sunday, 14 May 2006 21:47:32 UTC