RE: Component Values Must Be Context Independent from Jonathan Marsh on 2006-05-23 (www-ws-desc@w3.org from May 2006)

From: Jonathan Marsh <jmarsh@microsoft.com>
Date: Mon, 22 May 2006 17:20:29 -0700
To: "Arthur Ryman" <ryman@ca.ibm.com>, <www-ws-desc@w3.org>
Message-ID: <37D0366A39A9044286B2783EB4C3C4E802AE1E4B@RED-MSG-10.redmond.corp.microsoft.com>
Can you explain a bit more the difference between so-called "child
components" and "non-child components"?  I couldn't find these
distinguished clearly in the spec.  Do you just mean the parent
property?

 

________________________________

From: www-ws-desc-request@w3.org [mailto:www-ws-desc-request@w3.org] On
Behalf Of Arthur Ryman
Sent: Thursday, April 20, 2006 4:20 PM
To: www-ws-desc@w3.org
Subject: Component Values Must Be Context Independent

 


Components can be brought into a component model instance through
<import> and <include>. For scalability purposes, it is highly desirable
for the value of a component to be independent of the context that it
was brought it. 

The use case is a development tool for SOA applications that needs to
support hundreds or thousands of services. The tool needs to validate
the service definitions. The requirement is that the time to do this be
linear. We are currently experiencing performance problems validating
large sets of WSDL 1.1 documents. We need to have an spec-compliant
optimization for WSDL 2.0. 

Ideally, a tool should be able to compute the components directly
defined in a document without looking at any of the imports or includes.
There are two problems now that prevent this: 

1. In theory, we allow extensions that could alter the semantics of
imported or included components. However, there is no requirement or use
case for this flexibility, much less a realistic, compelling one. Note
that this is actually a real problem in XML Schema, e.g. due to
"features" such as cameleon includes, and <redefine>, you need to know
the context in which a document is included. 

2. The current definition of component equivalence is recursive in the
sense that to test if two components are equivalent, it is necessary to
determine if all of the components they refer to are equivalent. In
effect this means that you have to construct the entire component model
instance in order to resolve the references to the other components. 

Since WSDL documents typically include or import others, a collection of
WSDL documents is likely to be moderately connected when viewed as a
graph. Therefore, when you validate the collection, you end up
processing a given document many times in general. You process it a
number of times equal to the number of documents that refer to it
directly or indirectly (+ 1). This is non-linear. The exact degree of
non-linearity depends on how connected the graph is. Consider a simple
chain of n WSDL documents. 

A1 includes A2 includes A3 includes ... An 

Validating A1 requires reading n documents. 
Validating A2 requires reading n-1 documents. 
... 
Validating An requires reading 1 document. 

Therefore validating the whole set of documents requires readiing n +
(n-1) + ... + 1 = n(n+1)/2 = O(n^2), i.e. this is quadratic, not linear.


On the other hand, if the meaning if each document is independent of how
it is used then a smart tool could cache the results and only read n
documents. 

The fix is as follows: 

1. Add the following assertion. An extension MUST NOT affect the value
of components that are added to the component model via <import> or
<include>. 
2. State the definition of component equivalence as follows. Two
components are equivalent when: 
        A) All of their child components are equivalent. 
        B) All of their non-component properties are equal. 
        C) All of their non-child component properties refer to
components that have the same keys (e.g. names). 
The difference is that to test for equivalence, you only have to look at
a component's value-based properties and child components. You don't
have to traverse the component graph, which might take you into another
document. You only have to compare referred to components via their
keys. 

We then add a statement to each component explicitly stating what its
key values are. This is straight-forward. We already implicitly defined
keys when stating uniqueness rules, i.e. each Interface component in a
Description component must have a unique {name}. The key is usually the
{name} property. For Features and Properties, it is the {ref} property.
The complete list is: 

1. ElementDeclaration: {name} 

2. TypeDefinition: {name} 

3. Interface: {name} 

4. InterfaceFault: {name} 

5. InterfaceOperation: {name} 

6. InterfaceMessageReference: {message label} 

7. InterfaceFaultReference: {interface fault}.{name}. {message label} 

8. Binding: {name} 

9. BindingFault: {interfaceFault}.{name} 

10. BindingOperation: {interfaceOperation}.{name} 

11. BindingMessageReference: {interface message reference}.{message
label} 

12. BindingFaultReference: {interface fault reference}.{interface
fault}.{name}, {interface fault reference}.{message label} 

13 Service: {name} 

14. Endpoint: {name} 

15. Feature: {ref} 

16. Property: {ref} 

In general, any extension component that might be refered to needs to
define a key value, since that is how the reference is represented in
the XML serialization. 

Arthur Ryman,
IBM Software Group, Rational Division

blog: http://ryman.eclipsedevelopersjournal.com/
phone: +1-905-413-3077, TL 969-3077
assistant: +1-905-413-2411, TL 969-2411
fax: +1-905-413-4920, TL 969-4920
mobile: +1-416-939-5063, text: 4169395063@fido.ca
Received on Tuesday, 23 May 2006 00:20:50 UTC