Re: FW: Component Values Must Be Context Independent

Jonathan,

Yes, we should use consistent terminology.

There are 3 kinds of components:

1) The root Description component, which is in a category by itself
2) Top-level components: Interface, Binding, Service, Element Declaration, 
and Type Definition - these are contained in the Description component
3) Nested components: everything else - these all have a {parent} 
property.

When talking about component equivalence, we are really mainly interested 
in the Top-level components since we can get name collisions when we 
combine documents. Name collisions are impossible within a document by 
virtue of the schema. A document would be invalid if it had name conflicts 
and we wouldn't get past the XML infoset stage.

However, when we combine two or more documents, we need to avoid name 
conflicts. It's OK to have two Top-level components with the same name in 
two different documents as long as they are equivalent. That way we avoid 
the conflict. We need an efficient way to test for equivalence. We'd like 
to be able to decide if two components are equivalent just by examing 
their documents, not any other documents they might reference via import 
or include.

A property is a name-value pair. A component has a set of properties such 
that each property has a unique name.

Note that all components have some combination of property values that 
defines a key, i.e. in a valid component model instance, that combination 
of values uniquely identifies the component. For example, all the 
top-level components have QName keys, i.e. the {name} property. The spec 
does define this for every component although it doesn't explicitly call 
them keys. I gave a table of these in my orginal note. The keys are used 
in the XML infoset to refer to components. The keys are used in the 
component designators.

Two components are equivalent if and only if:
1) they have the same set of property names
2) for each of their property names, the corresponding values are 
equivalent.

So now we have boiled down the definition to that of value equivalence.

There are two kinds of values:
1) Component values - single or optional components, sets or lists of 
components. In general we can regard these as collections of components by 
treating single components as singleton sets.
2) Non-component values - everything else. Let's call these scalar values. 
These are things like strings, tokens, uri's, etc. They are often XML 
simple types.

Two scalar values are equivalent if and only if they are equal.

That leaves the definition of equivalence for component values. Since 
these are collections, we first require that the collections are 
"isomorphic" i.e. that there is an invertible mapping from one to the 
other (which is just the natural correspondence for lists in the case of 
ordered collections). Further, we require that the mapping relates 
equivalent components where we define equivalence as follows:

There are two kinds of components:
1) child components - these are nested components whose {parent} property 
is equal to the component that contains the property under consideration
2) non-child components - all other components

Two non-child component values are equivalent if their keys are equal. 
This means we don't have to inspect the contents of other documents. 
Within a document, one component references another via its key, e.g. its 
QName.
Two child component values are equivalent if they are equivalent as 
components - this is the recursive step. But we always recurse down the 
parent-child tree so the definition is non-circular and it terninates.

I claim that this definition of equivalence agrees with the old definition 
when applied to the component model instance as a whole.

However, the new definition is weaker since it might state that a specific 
pair of components are equivalent while the old definition says they are 
inequivalent. This is because the new definition just compares keys in 
some cases.

For example suppose in a component model instance we have 4 interfaces 
where A extends B and A' extends B'. Suppose all interfaces are in 
different documents and that document of A includes document of B, and 
document of A' includes document of B'. Suppose A and A' as interfaces 
have identical infosets. Suppose B and B' have the same QNames but differ 
in some other respect. The equivalence relations are as follows:



old
new
A and A'
not equivalent
equivalent
B and B'
not equivalent
not equivalent

Even though the new definition reports that A and A' are equivalent, the 
component model as a whole is invalid. Furthermore, by just flagging B and 
B' as inequivalent, we get a more useful error message, since by 
inspection A and A' look identical.

Arthur Ryman,
IBM Software Group, Rational Division

blog: http://ryman.eclipsedevelopersjournal.com/
phone: +1-905-413-3077, TL 969-3077
assistant: +1-905-413-2411, TL 969-2411
fax: +1-905-413-4920, TL 969-4920
mobile: +1-416-939-5063, text: 4169395063@fido.ca



"Jonathan Marsh" <jmarsh@microsoft.com> 
Sent by: www-ws-desc-request@w3.org
05/23/2006 03:53 PM

To
<www-ws-desc@w3.org>
cc

Subject
FW: Component Values Must Be Context Independent






[Random keystroke sent it too early, and then got interrupted by hours of 
telcons ? completed below.]
 

From: Jonathan Marsh 
Sent: Tuesday, May 23, 2006 10:07 AM
To: 'Arthur Ryman'
Cc: www-ws-desc@w3.org; www-ws-desc-request@w3.org
Subject: RE: Component Values Must Be Context Independent
 
OK, just trying to understand the terminology.  The spec doesn?t use the 
term ?child component?.  It does use the term ?nested components? which 
are non-top-level components (e.g. any component but Description, 
Interface, Binding, Service, Element Declaration, and Type Definition.) 
2.17 for instance talks about ?references to other components? but doesn?t 
really define the term, which you seem to be specializing into ?child? and 
?non-child? components.
 
So, a child component property is
a)       A property of component X,
b)      with a value of a component, set of components, or list of 
components,
c)       where each of the components in (b) having a parent property,
d)      and where each of these parent property has a value of component 
X.
Is that right?
 
The reason for trying to get clarity here is that we haven?t distinguished 
between properties that contain components ?by value? and those that 
contain components ?by reference?, which is IMO an implementation choice. 
We chose one strategy for serializing the graph as a tree for the 
interchange format but that?s just one choice.
 
Is the categorization below correct?
 
Another observation: comparing the value of the ?value? property involves 
an infoset comparison, about which I don?t see anything in 2.17.
 
 
Child component property
Non-component property
Non-child component property
 
(properties missing from the table below)
interface operation  |  Binding Operation.{interface operation}
binding message references  |  Binding Operation.{binding message 
references}
binding fault reference   |  Binding Operation.{binding fault references}
Interface message references | Binding Message Reference.{interface 
message references}
Interface fault reference  |  Binding Fault Reference.{interface fault 
reference}
 

Property
Where Defined
address
Endpoint.{address} 
binding
Endpoint.{binding} 
binding faults
Binding.{binding faults} 
binding operations
Binding.{binding operations} 
bindings
Description.{bindings} 
direction
Interface Fault Reference.{direction}, Interface Message Reference.{
direction} 
element declaration
Interface Fault.{element declaration}, Interface Message Reference.{
element declaration} 
element declarations
Description.{element declarations} 
endpoints
Service.{endpoints} 
extended interfaces
Interface.{extended interfaces} 
features
.{features}, Binding.{features}, Binding Fault.{features}, Binding Fault 
Reference.{features}, Binding Message Reference.{features}, Binding 
Operation.{features}, Endpoint.{features}, Interface.{features}, Interface 
Fault.{features}, Interface Fault Reference.{features}, Interface Message 
Reference.{features}, Interface Operation.{features}, Service.{features} 
interface
Binding.{interface}, Service.{interface} 
interface fault
Binding Fault.{interface fault}, Interface Fault Reference.{interface 
fault} 
interface fault references
Interface Operation.{interface fault references} 
interface faults
Interface.{interface faults} 
interface message references
Interface Operation.{interface message references} 
interface operations
Interface.{interface operations} 
interfaces
Description.{interfaces} 
message content model
Interface Message Reference.{message content model} 
message exchange pattern
Interface Operation.{message exchange pattern} 
message label
Interface Fault Reference.{message label}, Interface Message Reference.{
message label} 
name
.{name}, Binding.{name}, Element Declaration.{name}, Endpoint.{name}, 
Interface.{name}, Interface Fault.{name}, Interface Operation.{name}, 
Service.{name}, Type Definition.{name} 
parent
.{parent}, Binding Fault.{parent}, Binding Fault Reference.{parent}, 
Binding Message Reference.{parent}, Binding Operation.{parent}, Endpoint.{
parent}, Feature.{parent}, Interface Fault.{parent}, Interface Fault 
Reference.{parent}, Interface Message Reference.{parent}, Interface 
Operation.{parent}, Property.{parent} 
properties
.{properties}, Binding.{properties}, Binding Fault.{properties}, Binding 
Fault Reference.{properties}, Binding Message Reference.{properties}, 
Binding Operation.{properties}, Endpoint.{properties}, Interface.{
properties}, Interface Fault.{properties}, Interface Fault Reference.{
properties}, Interface Message Reference.{properties}, Interface 
Operation.{properties}, Service.{properties} 
ref
Feature.{ref}, Property.{ref} 
required
Feature.{required} 
services
Description.{services} 
style
Interface Operation.{style} 
system
Element Declaration.{system}, Type Definition.{system} 
type
Binding.{type} 
type definitions
Description.{type definitions} 
value
Property.{value} 
value constraint
Property.{value constraint} 
 
 

From: Arthur Ryman [mailto:ryman@ca.ibm.com] 
Sent: Tuesday, May 23, 2006 8:57 AM
To: Jonathan Marsh
Cc: www-ws-desc@w3.org; www-ws-desc-request@w3.org
Subject: RE: Component Values Must Be Context Independent
 

Jonathan, 

Yes, there is a {parent} property for each child component. The idea is 
that to compute equivalence, you look at the non-reference properties and 
the child components. You compare the reference properties by value, i.e. 
don't traverse into the referenced component if it is not a child. This 
let's you compute equivalence based on the contents of the document that 
contains the enclosing top-level component, i.e. you don't have to look at 
other documents. 

Arthur Ryman,
IBM Software Group, Rational Division

blog: http://ryman.eclipsedevelopersjournal.com/
phone: +1-905-413-3077, TL 969-3077
assistant: +1-905-413-2411, TL 969-2411
fax: +1-905-413-4920, TL 969-4920
mobile: +1-416-939-5063, text: 4169395063@fido.ca 

"Jonathan Marsh" <jmarsh@microsoft.com> 
Sent by: www-ws-desc-request@w3.org 
05/22/2006 08:20 PM 


To
Arthur Ryman/Toronto/IBM@IBMCA, <www-ws-desc@w3.org> 
cc
 
Subject
RE: Component Values Must Be Context Independent
 


 
 




Can you explain a bit more the difference between so-called ?child 
components? and ?non-child components??  I couldn?t find these 
distinguished clearly in the spec.  Do you just mean the parent property? 
  
 


From: www-ws-desc-request@w3.org [mailto:www-ws-desc-request@w3.org] On 
Behalf Of Arthur Ryman
Sent: Thursday, April 20, 2006 4:20 PM
To: www-ws-desc@w3.org
Subject: Component Values Must Be Context Independent 
  

Components can be brought into a component model instance through <import> 
and <include>. For scalability purposes, it is highly desirable for the 
value of a component to be independent of the context that it was brought 
it. 

The use case is a development tool for SOA applications that needs to 
support hundreds or thousands of services. The tool needs to validate the 
service definitions. The requirement is that the time to do this be 
linear. We are currently experiencing performance problems validating 
large sets of WSDL 1.1 documents. We need to have an spec-compliant 
optimization for WSDL 2.0. 

Ideally, a tool should be able to compute the components directly defined 
in a document without looking at any of the imports or includes. There are 
two problems now that prevent this: 

1. In theory, we allow extensions that could alter the semantics of 
imported or included components. However, there is no requirement or use 
case for this flexibility, much less a realistic, compelling one. Note 
that this is actually a real problem in XML Schema, e.g. due to "features" 
such as cameleon includes, and <redefine>, you need to know the context in 
which a document is included. 

2. The current definition of component equivalence is recursive in the 
sense that to test if two components are equivalent, it is necessary to 
determine if all of the components they refer to are equivalent. In effect 
this means that you have to construct the entire component model instance 
in order to resolve the references to the other components. 

Since WSDL documents typically include or import others, a collection of 
WSDL documents is likely to be moderately connected when viewed as a 
graph. Therefore, when you validate the collection, you end up processing 
a given document many times in general. You process it a number of times 
equal to the number of documents that refer to it directly or indirectly 
(+ 1). This is non-linear. The exact degree of non-linearity depends on 
how connected the graph is. Consider a simple chain of n WSDL documents. 

A1 includes A2 includes A3 includes ... An 

Validating A1 requires reading n documents. 
Validating A2 requires reading n-1 documents. 
... 
Validating An requires reading 1 document. 

Therefore validating the whole set of documents requires readiing n + 
(n-1) + ... + 1 = n(n+1)/2 = O(n^2), i.e. this is quadratic, not linear. 

On the other hand, if the meaning if each document is independent of how 
it is used then a smart tool could cache the results and only read n 
documents. 

The fix is as follows: 

1. Add the following assertion. An extension MUST NOT affect the value of 
components that are added to the component model via <import> or 
<include>. 
2. State the definition of component equivalence as follows. Two 
components are equivalent when: 
       A) All of their child components are equivalent. 
       B) All of their non-component properties are equal. 
       C) All of their non-child component properties refer to components 
that have the same keys (e.g. names). 
The difference is that to test for equivalence, you only have to look at a 
component's value-based properties and child components. You don't have to 
traverse the component graph, which might take you into another document. 
You only have to compare referred to components via their keys. 

We then add a statement to each component explicitly stating what its key 
values are. This is straight-forward. We already implicitly defined keys 
when stating uniqueness rules, i.e. each Interface component in a 
Description component must have a unique {name}. The key is usually the 
{name} property. For Features and Properties, it is the {ref} property. 
The complete list is: 

1. ElementDeclaration: {name} 

2. TypeDefinition: {name} 

3. Interface: {name} 

4. InterfaceFault: {name} 

5. InterfaceOperation: {name} 

6. InterfaceMessageReference: {message label} 

7. InterfaceFaultReference: {interface fault}.{name}. {message label} 

8. Binding: {name} 

9. BindingFault: {interfaceFault}.{name} 

10. BindingOperation: {interfaceOperation}.{name} 

11. BindingMessageReference: {interface message reference}.{message label} 


12. BindingFaultReference: {interface fault reference}.{interface 
fault}.{name}, {interface fault reference}.{message label} 

13 Service: {name} 

14. Endpoint: {name} 

15. Feature: {ref} 

16. Property: {ref} 

In general, any extension component that might be refered to needs to 
define a key value, since that is how the reference is represented in the 
XML serialization. 

Arthur Ryman,
IBM Software Group, Rational Division

blog: http://ryman.eclipsedevelopersjournal.com/
phone: +1-905-413-3077, TL 969-3077
assistant: +1-905-413-2411, TL 969-2411
fax: +1-905-413-4920, TL 969-4920
mobile: +1-416-939-5063, text: 4169395063@fido.ca 

Received on Tuesday, 30 May 2006 03:24:18 UTC