Component Values Must Be Context Independent from Arthur Ryman on 2006-04-20 (www-ws-desc@w3.org from April 2006)

From: Arthur Ryman <ryman@ca.ibm.com>
Date: Thu, 20 Apr 2006 19:19:39 -0400
To: www-ws-desc@w3.org
Message-ID: <OFC9846C22.7BD53AC0-ON85257156.0053886F-85257156.008022E7@ca.ibm.com>
Components can be brought into a component model instance through <import> 
and <include>. For scalability purposes, it is highly desirable for the 
value of a component to be independent of the context that it was brought 
it.

The use case is a development tool for SOA applications that needs to 
support hundreds or thousands of services. The tool needs to validate the 
service definitions. The requirement is that the time to do this be 
linear. We are currently experiencing performance problems validating 
large sets of WSDL 1.1 documents. We need to have an spec-compliant 
optimization for WSDL 2.0.

Ideally, a tool should be able to compute the components directly defined 
in a document without looking at any of the imports or includes. There are 
two problems now that prevent this:

1. In theory, we allow extensions that could alter the semantics of 
imported or included components. However, there is no requirement or use 
case for this flexibility, much less a realistic, compelling one. Note 
that this is actually a real problem in XML Schema, e.g. due to "features" 
such as cameleon includes, and <redefine>, you need to know the context in 
which a document is included.

2. The current definition of component equivalence is recursive in the 
sense that to test if two components are equivalent, it is necessary to 
determine if all of the components they refer to are equivalent. In effect 
this means that you have to construct the entire component model instance 
in order to resolve the references to the other components.

Since WSDL documents typically include or import others, a collection of 
WSDL documents is likely to be moderately connected when viewed as a 
graph. Therefore, when you validate the collection, you end up processing 
a given document many times in general. You process it a number of times 
equal to the number of documents that refer to it directly or indirectly 
(+ 1). This is non-linear. The exact degree of non-linearity depends on 
how connected the graph is. Consider a simple chain of n WSDL documents.

A1 includes A2 includes A3 includes ... An

Validating A1 requires reading n documents.
Validating A2 requires reading n-1 documents.
...
Validating An requires reading 1 document.

Therefore validating the whole set of documents requires readiing n + 
(n-1) + ... + 1 = n(n+1)/2 = O(n^2), i.e. this is quadratic, not linear.

On the other hand, if the meaning if each document is independent of how 
it is used then a smart tool could cache the results and only read n 
documents.

The fix is as follows:

1. Add the following assertion. An extension MUST NOT affect the value of 
components that are added to the component model via <import> or 
<include>.
2. State the definition of component equivalence as follows. Two 
components are equivalent when:
        A) All of their child components are equivalent.
        B) All of their non-component properties are equal.
        C) All of their non-child component properties refer to components 
that have the same keys (e.g. names).
The difference is that to test for equivalence, you only have to look at a 
component's value-based properties and child components. You don't have to 
traverse the component graph, which might take you into another document. 
You only have to compare referred to components via their keys.

We then add a statement to each component explicitly stating what its key 
values are. This is straight-forward. We already implicitly defined keys 
when stating uniqueness rules, i.e. each Interface component in a 
Description component must have a unique {name}. The key is usually the 
{name} property. For Features and Properties, it is the {ref} property. 
The complete list is:

1. ElementDeclaration: {name}

2. TypeDefinition: {name}

3. Interface: {name}

4. InterfaceFault: {name}

5. InterfaceOperation: {name}

6. InterfaceMessageReference: {message label}

7. InterfaceFaultReference: {interface fault}.{name}. {message label}

8. Binding: {name}

9. BindingFault: {interfaceFault}.{name}

10. BindingOperation: {interfaceOperation}.{name}

11. BindingMessageReference: {interface message reference}.{message label}

12. BindingFaultReference: {interface fault reference}.{interface 
fault}.{name}, {interface fault reference}.{message label}

13 Service: {name}

14. Endpoint: {name}

15. Feature: {ref}

16. Property: {ref}

In general, any extension component that might be refered to needs to 
define a key value, since that is how the reference is represented in the 
XML serialization.

Arthur Ryman,
IBM Software Group, Rational Division

blog: http://ryman.eclipsedevelopersjournal.com/
phone: +1-905-413-3077, TL 969-3077
assistant: +1-905-413-2411, TL 969-2411
fax: +1-905-413-4920, TL 969-4920
mobile: +1-416-939-5063, text: 4169395063@fido.ca
Received on Thursday, 20 April 2006 23:19:48 UTC