Re: WSDL Import/Include Locations from Arthur Ryman on 2004-03-08 (www-ws-desc@w3.org from March 2004)

From: Arthur Ryman <ryman@ca.ibm.com>
Date: Mon, 8 Mar 2004 12:51:45 -0500
To: www-ws-desc@w3.org
Message-ID: <OFA7F56AAB.B28FA918-ON85256E51.005D8BA0-85256E51.00621F33@ca.ibm.com>
Yaron,

IMHO, I don't think adding failover to import/include is the right way to 
go. Here are my reasons:

1. WSDL is primarily used at development time so if the network is down, 
you just try again in a few minutes
2. For import/include, the root document is probably at the same host as 
the imported and included documents, so if you can reach the root 
document, then you can reach the others
3. If the network is really bad, then just fixing import/inlcude doesn't 
help. XSD import/include would fail too, XInclude would fail, UDDI access 
would fail, and invoking the Web service would fail
4. The problem of network reliability is a broader issue and has other 
solutions such as DNS, edge servers (Akamai), clusters, and grids

Arthur Ryman,
Rational Desktop Tools Development

phone: +1-905-413-3077, TL 969-3077
assistant: +1-905-413-2411, TL 969-2411
fax: +1-905-413-4920, TL 969-4920
mobile: +1-416-969-5063
intranet: http://w3.torolab.ibm.com/DEAB/

Yaron wrote on 03/05/2004 02:27:57 PM:

> 
> As a member of this group I work hard to make productive contributions. 
> I spend a lot of time preparing what I'm going to say. I realize that 
> this can still result in me saying silly things but at least I try hard 
> to prevent it.
> 
> So if I'm now saying something silly I apologize. I've thought this one 
> through as best as I'm able and I just don't understand the objections.
> 
> If someone defines a WSDL which imports/includes other information then, 

> generally speaking, the WSDL can't be run unless that other information 
> is available. However endpoints on the Internet, along with the Internet 

> itself, are unreliable.
> 
> One of the most common ways to deal with unreliability is to make 
> information available at distinct locations. The advantage of making 
> data available at distinct locations is that not only does it provide 
> reliability in the face of endpoint failures but it also provides 
> reliability in the face of network failures.
> 
> If one has a list of possible locations then there are two distinct 
> strategies one can use in retrieving the data. One can walk through the 
> list serially or in parallel.
> 
> Generally speaking serially is preferred. Not only does it reduce the 
> load on the network but it also distributes the load (assuming a random 
> starting point in the list) amongst the various endpoints. This later 
> quality is especially nice as many organizations do not have the 
> resources to run multiple endpoints so if they want to distribute their 
> WSDL they will need to ask other people for a favor. The beauty of a 
> serial based system is that the more people are willing to carry a copy 
> of the document the less load each volunteer has to deal with. In other 
> words, serial processing of alternate locations encourages people to 
> volunteer to replicate each other's content. This is pure goodness.
> 
> Parallel processing also has its place. In cases where performance is 
> super critical it is can be reasonable to download redundant data in 
> parallel in order to reduce the worst case performance scenario. But 
> parallel processing of location lists should be used sparringly. It puts 

> a much heavier load on the net and it actively discourages volunteers to 

> replicate data since each volunteer will have to bear the full load of 
> all requests regardless of how many volunteers there are. This is the 
> opposite of serial processing where the larger the set of volunteers the 

> less load each volunteer has to carry.
> 
> If I understand what is being proposed below then in the case of 
> importing documents it would be legal to import the same content 
> multiple times. Which is a good thing. But, if the only way to get 
> robust behavior is to include multiple redundant imports then we are 
> forcing everyone, everywhere to use parallel loading. This means we are 
> actively discouraging people from replicating each other's data. That 
> just doesn't seem like a good idea.
> 
> Allowing import/include to have more than one URI and specifying that 
> the list is to be processed serially with a random start location is not 

> a new design. It is well understood, well proven and widely used. Why is 

> there so much resistence to something so obvious?
> 
> I apologize if I'm being block headed but I really don't understand the 
> source of contention.
> 
>    Thank you for your time and patience,
> 
>       Yaron
Received on Monday, 8 March 2004 12:52:25 UTC