- From: Arthur Ryman <ryman@ca.ibm.com>
- Date: Mon, 29 May 2006 23:23:52 -0400
- To: "Jonathan Marsh" <jmarsh@microsoft.com>
- Cc: www-ws-desc@w3.org, www-ws-desc-request@w3.org
- Message-ID: <OF85F7E68A.E3388A82-ON8525717E.00066EE6-8525717E.0012AED9@ca.ibm.com>
Jonathan, Yes, we should use consistent terminology. There are 3 kinds of components: 1) The root Description component, which is in a category by itself 2) Top-level components: Interface, Binding, Service, Element Declaration, and Type Definition - these are contained in the Description component 3) Nested components: everything else - these all have a {parent} property. When talking about component equivalence, we are really mainly interested in the Top-level components since we can get name collisions when we combine documents. Name collisions are impossible within a document by virtue of the schema. A document would be invalid if it had name conflicts and we wouldn't get past the XML infoset stage. However, when we combine two or more documents, we need to avoid name conflicts. It's OK to have two Top-level components with the same name in two different documents as long as they are equivalent. That way we avoid the conflict. We need an efficient way to test for equivalence. We'd like to be able to decide if two components are equivalent just by examing their documents, not any other documents they might reference via import or include. A property is a name-value pair. A component has a set of properties such that each property has a unique name. Note that all components have some combination of property values that defines a key, i.e. in a valid component model instance, that combination of values uniquely identifies the component. For example, all the top-level components have QName keys, i.e. the {name} property. The spec does define this for every component although it doesn't explicitly call them keys. I gave a table of these in my orginal note. The keys are used in the XML infoset to refer to components. The keys are used in the component designators. Two components are equivalent if and only if: 1) they have the same set of property names 2) for each of their property names, the corresponding values are equivalent. So now we have boiled down the definition to that of value equivalence. There are two kinds of values: 1) Component values - single or optional components, sets or lists of components. In general we can regard these as collections of components by treating single components as singleton sets. 2) Non-component values - everything else. Let's call these scalar values. These are things like strings, tokens, uri's, etc. They are often XML simple types. Two scalar values are equivalent if and only if they are equal. That leaves the definition of equivalence for component values. Since these are collections, we first require that the collections are "isomorphic" i.e. that there is an invertible mapping from one to the other (which is just the natural correspondence for lists in the case of ordered collections). Further, we require that the mapping relates equivalent components where we define equivalence as follows: There are two kinds of components: 1) child components - these are nested components whose {parent} property is equal to the component that contains the property under consideration 2) non-child components - all other components Two non-child component values are equivalent if their keys are equal. This means we don't have to inspect the contents of other documents. Within a document, one component references another via its key, e.g. its QName. Two child component values are equivalent if they are equivalent as components - this is the recursive step. But we always recurse down the parent-child tree so the definition is non-circular and it terninates. I claim that this definition of equivalence agrees with the old definition when applied to the component model instance as a whole. However, the new definition is weaker since it might state that a specific pair of components are equivalent while the old definition says they are inequivalent. This is because the new definition just compares keys in some cases. For example suppose in a component model instance we have 4 interfaces where A extends B and A' extends B'. Suppose all interfaces are in different documents and that document of A includes document of B, and document of A' includes document of B'. Suppose A and A' as interfaces have identical infosets. Suppose B and B' have the same QNames but differ in some other respect. The equivalence relations are as follows: old new A and A' not equivalent equivalent B and B' not equivalent not equivalent Even though the new definition reports that A and A' are equivalent, the component model as a whole is invalid. Furthermore, by just flagging B and B' as inequivalent, we get a more useful error message, since by inspection A and A' look identical. Arthur Ryman, IBM Software Group, Rational Division blog: http://ryman.eclipsedevelopersjournal.com/ phone: +1-905-413-3077, TL 969-3077 assistant: +1-905-413-2411, TL 969-2411 fax: +1-905-413-4920, TL 969-4920 mobile: +1-416-939-5063, text: 4169395063@fido.ca "Jonathan Marsh" <jmarsh@microsoft.com> Sent by: www-ws-desc-request@w3.org 05/23/2006 03:53 PM To <www-ws-desc@w3.org> cc Subject FW: Component Values Must Be Context Independent [Random keystroke sent it too early, and then got interrupted by hours of telcons ? completed below.] From: Jonathan Marsh Sent: Tuesday, May 23, 2006 10:07 AM To: 'Arthur Ryman' Cc: www-ws-desc@w3.org; www-ws-desc-request@w3.org Subject: RE: Component Values Must Be Context Independent OK, just trying to understand the terminology. The spec doesn?t use the term ?child component?. It does use the term ?nested components? which are non-top-level components (e.g. any component but Description, Interface, Binding, Service, Element Declaration, and Type Definition.) 2.17 for instance talks about ?references to other components? but doesn?t really define the term, which you seem to be specializing into ?child? and ?non-child? components. So, a child component property is a) A property of component X, b) with a value of a component, set of components, or list of components, c) where each of the components in (b) having a parent property, d) and where each of these parent property has a value of component X. Is that right? The reason for trying to get clarity here is that we haven?t distinguished between properties that contain components ?by value? and those that contain components ?by reference?, which is IMO an implementation choice. We chose one strategy for serializing the graph as a tree for the interchange format but that?s just one choice. Is the categorization below correct? Another observation: comparing the value of the ?value? property involves an infoset comparison, about which I don?t see anything in 2.17. Child component property Non-component property Non-child component property (properties missing from the table below) interface operation | Binding Operation.{interface operation} binding message references | Binding Operation.{binding message references} binding fault reference | Binding Operation.{binding fault references} Interface message references | Binding Message Reference.{interface message references} Interface fault reference | Binding Fault Reference.{interface fault reference} Property Where Defined address Endpoint.{address} binding Endpoint.{binding} binding faults Binding.{binding faults} binding operations Binding.{binding operations} bindings Description.{bindings} direction Interface Fault Reference.{direction}, Interface Message Reference.{ direction} element declaration Interface Fault.{element declaration}, Interface Message Reference.{ element declaration} element declarations Description.{element declarations} endpoints Service.{endpoints} extended interfaces Interface.{extended interfaces} features .{features}, Binding.{features}, Binding Fault.{features}, Binding Fault Reference.{features}, Binding Message Reference.{features}, Binding Operation.{features}, Endpoint.{features}, Interface.{features}, Interface Fault.{features}, Interface Fault Reference.{features}, Interface Message Reference.{features}, Interface Operation.{features}, Service.{features} interface Binding.{interface}, Service.{interface} interface fault Binding Fault.{interface fault}, Interface Fault Reference.{interface fault} interface fault references Interface Operation.{interface fault references} interface faults Interface.{interface faults} interface message references Interface Operation.{interface message references} interface operations Interface.{interface operations} interfaces Description.{interfaces} message content model Interface Message Reference.{message content model} message exchange pattern Interface Operation.{message exchange pattern} message label Interface Fault Reference.{message label}, Interface Message Reference.{ message label} name .{name}, Binding.{name}, Element Declaration.{name}, Endpoint.{name}, Interface.{name}, Interface Fault.{name}, Interface Operation.{name}, Service.{name}, Type Definition.{name} parent .{parent}, Binding Fault.{parent}, Binding Fault Reference.{parent}, Binding Message Reference.{parent}, Binding Operation.{parent}, Endpoint.{ parent}, Feature.{parent}, Interface Fault.{parent}, Interface Fault Reference.{parent}, Interface Message Reference.{parent}, Interface Operation.{parent}, Property.{parent} properties .{properties}, Binding.{properties}, Binding Fault.{properties}, Binding Fault Reference.{properties}, Binding Message Reference.{properties}, Binding Operation.{properties}, Endpoint.{properties}, Interface.{ properties}, Interface Fault.{properties}, Interface Fault Reference.{ properties}, Interface Message Reference.{properties}, Interface Operation.{properties}, Service.{properties} ref Feature.{ref}, Property.{ref} required Feature.{required} services Description.{services} style Interface Operation.{style} system Element Declaration.{system}, Type Definition.{system} type Binding.{type} type definitions Description.{type definitions} value Property.{value} value constraint Property.{value constraint} From: Arthur Ryman [mailto:ryman@ca.ibm.com] Sent: Tuesday, May 23, 2006 8:57 AM To: Jonathan Marsh Cc: www-ws-desc@w3.org; www-ws-desc-request@w3.org Subject: RE: Component Values Must Be Context Independent Jonathan, Yes, there is a {parent} property for each child component. The idea is that to compute equivalence, you look at the non-reference properties and the child components. You compare the reference properties by value, i.e. don't traverse into the referenced component if it is not a child. This let's you compute equivalence based on the contents of the document that contains the enclosing top-level component, i.e. you don't have to look at other documents. Arthur Ryman, IBM Software Group, Rational Division blog: http://ryman.eclipsedevelopersjournal.com/ phone: +1-905-413-3077, TL 969-3077 assistant: +1-905-413-2411, TL 969-2411 fax: +1-905-413-4920, TL 969-4920 mobile: +1-416-939-5063, text: 4169395063@fido.ca "Jonathan Marsh" <jmarsh@microsoft.com> Sent by: www-ws-desc-request@w3.org 05/22/2006 08:20 PM To Arthur Ryman/Toronto/IBM@IBMCA, <www-ws-desc@w3.org> cc Subject RE: Component Values Must Be Context Independent Can you explain a bit more the difference between so-called ?child components? and ?non-child components?? I couldn?t find these distinguished clearly in the spec. Do you just mean the parent property? From: www-ws-desc-request@w3.org [mailto:www-ws-desc-request@w3.org] On Behalf Of Arthur Ryman Sent: Thursday, April 20, 2006 4:20 PM To: www-ws-desc@w3.org Subject: Component Values Must Be Context Independent Components can be brought into a component model instance through <import> and <include>. For scalability purposes, it is highly desirable for the value of a component to be independent of the context that it was brought it. The use case is a development tool for SOA applications that needs to support hundreds or thousands of services. The tool needs to validate the service definitions. The requirement is that the time to do this be linear. We are currently experiencing performance problems validating large sets of WSDL 1.1 documents. We need to have an spec-compliant optimization for WSDL 2.0. Ideally, a tool should be able to compute the components directly defined in a document without looking at any of the imports or includes. There are two problems now that prevent this: 1. In theory, we allow extensions that could alter the semantics of imported or included components. However, there is no requirement or use case for this flexibility, much less a realistic, compelling one. Note that this is actually a real problem in XML Schema, e.g. due to "features" such as cameleon includes, and <redefine>, you need to know the context in which a document is included. 2. The current definition of component equivalence is recursive in the sense that to test if two components are equivalent, it is necessary to determine if all of the components they refer to are equivalent. In effect this means that you have to construct the entire component model instance in order to resolve the references to the other components. Since WSDL documents typically include or import others, a collection of WSDL documents is likely to be moderately connected when viewed as a graph. Therefore, when you validate the collection, you end up processing a given document many times in general. You process it a number of times equal to the number of documents that refer to it directly or indirectly (+ 1). This is non-linear. The exact degree of non-linearity depends on how connected the graph is. Consider a simple chain of n WSDL documents. A1 includes A2 includes A3 includes ... An Validating A1 requires reading n documents. Validating A2 requires reading n-1 documents. ... Validating An requires reading 1 document. Therefore validating the whole set of documents requires readiing n + (n-1) + ... + 1 = n(n+1)/2 = O(n^2), i.e. this is quadratic, not linear. On the other hand, if the meaning if each document is independent of how it is used then a smart tool could cache the results and only read n documents. The fix is as follows: 1. Add the following assertion. An extension MUST NOT affect the value of components that are added to the component model via <import> or <include>. 2. State the definition of component equivalence as follows. Two components are equivalent when: A) All of their child components are equivalent. B) All of their non-component properties are equal. C) All of their non-child component properties refer to components that have the same keys (e.g. names). The difference is that to test for equivalence, you only have to look at a component's value-based properties and child components. You don't have to traverse the component graph, which might take you into another document. You only have to compare referred to components via their keys. We then add a statement to each component explicitly stating what its key values are. This is straight-forward. We already implicitly defined keys when stating uniqueness rules, i.e. each Interface component in a Description component must have a unique {name}. The key is usually the {name} property. For Features and Properties, it is the {ref} property. The complete list is: 1. ElementDeclaration: {name} 2. TypeDefinition: {name} 3. Interface: {name} 4. InterfaceFault: {name} 5. InterfaceOperation: {name} 6. InterfaceMessageReference: {message label} 7. InterfaceFaultReference: {interface fault}.{name}. {message label} 8. Binding: {name} 9. BindingFault: {interfaceFault}.{name} 10. BindingOperation: {interfaceOperation}.{name} 11. BindingMessageReference: {interface message reference}.{message label} 12. BindingFaultReference: {interface fault reference}.{interface fault}.{name}, {interface fault reference}.{message label} 13 Service: {name} 14. Endpoint: {name} 15. Feature: {ref} 16. Property: {ref} In general, any extension component that might be refered to needs to define a key value, since that is how the reference is represented in the XML serialization. Arthur Ryman, IBM Software Group, Rational Division blog: http://ryman.eclipsedevelopersjournal.com/ phone: +1-905-413-3077, TL 969-3077 assistant: +1-905-413-2411, TL 969-2411 fax: +1-905-413-4920, TL 969-4920 mobile: +1-416-939-5063, text: 4169395063@fido.ca
Received on Tuesday, 30 May 2006 03:24:18 UTC