Re: WSDL Import/Include Locations from Yaron Y. Goland on 2004-03-18 (www-ws-desc@w3.org from March 2004)

From: Yaron Y. Goland <ygoland@bea.com>
Date: Wed, 17 Mar 2004 19:54:20 -0800
To: Martin Gudgin <mgudgin@microsoft.com>
Cc: paul.downey@bt.com, ryman@ca.ibm.com, www-ws-desc@w3.org
Message-ID: <40591D6C.4070508@bea.com>
There are two goals in this area I think WSDL needs to achieve:

Goal #1 - Location Redundancy
Goal #2 - Load Balancing

It is clear now to me from the comments on the list that Goal #1 can be 
achieved today by WSDL as is. But I am still concerned that Goal #2 
cannot be reasonably achieved.

Let's take the example from my previous letter and expand it a little:

Imagine I have a WSDL namespace FOO and two files, FileA and FileB that 
both define components in namespace FOO. The two files do not define any 
common components and the two files do not include each other. In other 
words, each file is completely stand alone. For redundancy and load 
balancing both files have been made available at multiple locations. The 
result is that the imports for a WSDL for namespace BAR will be:
<import namespace="f:oo" location="http://E.com/fileA"/>
<import namespace="f:oo" location="http://E2.com/fileA"/>
<import namespace="f:oo" location="http://E3.com/fileA"/>
<import namespace="f:oo" location="http://E.com/fileB"/>
<import namespace="f:oo" location="http://E2.com/fileB"/>
<import namespace="f:oo" location="http://E3.com/fileB"/>

Let's assume an extremely sophisticated WSDL processor who only executes 
import statements on demand. On the first access to an element or 
attribute from the FOO namespace the WSDL processor will go to it's 
import statements and try to import http://E.com/fileA since it is the 
first import statement.

PERFORMANCE PROBLEM #1 - Because the WSDL is published and is static 
which ever server is listed first on the import list is going to get a 
disproportionate share of the file access traffic.  This robs 
implementers of the most basic benefit of load distribution.

Assuming that http://E.com/fileA contained the necessary elements the 
WSDL will not process any additional import statements.

At some point later the WSDL needs an element or attribute from the FOO 
namespace that was not provided by the import of fileA. So the WSDL 
engine goes to the next import statement.

PERFORMANCE PROBLEM #2 - Because the WSDL engine isn't told that 
http://E2.com/fileA and http://E3.com/fileA are actually just copies of 
http://E.com/fileA, it will be necessary for the WSDL engine to download 
both the example2 and example3 files. This means that example2 and 
example3 are likely to be hit everytime the WSDL's imports are resolved. 
In other words, there is no load distribution.

Having gone through E2 and E3 the WSDL engine will eventually reach 
http://E.com/fileB.

PERFORMANCE PROBLEM #3 - This is really just a repeat of performance 
problem #1. http://E.com/fileB is being hit disproportionately hard 
because it is unlucky enough to be listed first.

One can easily solve these load balancing problems by adopting a 
slightly different import/include syntax:

<import namespace="f:oo" location="http://example.com/fileB
                                    http://example2.com/fileB
                                    http://example3.come/fileB"/>

with an instruction to randomly choose a starting point in the list.

With this simple change we would get both redundancy and load balancing.

	Thanks,

		Yaron

Martin Gudgin wrote:

> Yaron,
> 
> I think it's OK for a WSDL processor to process imports
> opportunistically. That is, not resolve any imports until it sees a
> reference to a component in a given namespace, then process imports in
> order until it finds that component. And so on.
> 
> Gudge
> 
>  > -----Original Message-----
>  > From: Yaron Y. Goland [mailto:ygoland@bea.com]
>  > Sent: 12 March 2004 19:52
>  > To: Martin Gudgin
>  > Cc: paul.downey@bt.com; ryman@ca.ibm.com; www-ws-desc@w3.org
>  > Subject: Re: WSDL Import/Include Locations
>  >
>  > Although I now believe I understand the distinction I believe
>  > this is something subtle enough to deserve a specific call
>  > out in part 1.
>  >
>  > Section 4.2 current contains the sentence "Directly imported
>  > means that component importation is not transitive;
>  > components imported by one of the imported documents are not
>  > available to the original importing document unless the are
>  > imported directly by that document."
>  >
>  > I would propose inserting an additional sentence after this
>  > one that reads "Note, however, that if a directly imported
>  > document includes another document then the components in the
>  > included document are available to the original importing document."
>  >
>  > This then brings up another scenario that I'm not even sure is legal.
>  > Imagine I have a WSDL namespace FOO and two files, FileA and
>  > FileB that both define components in namespace FOO. The two
>  > files do not define any common components and the two files
>  > do not include each other. In other words, each file is
>  > completely stand alone. In that case if a WSDL for namespace
>  > BAR should have:
>  > <import namespace="f:oo" location="http://example.com/fileA"/>
>  > <import namespace="f:oo" location="http://example.com/fileB"/>
>  >
>  > And if the WSDL should optimize by only successfully
>  > downloading one of the two links then components needed by
>  > WSDL BAR would not be downloaded.
>  >
>  > This scenario presumes however that it is legal to have two
>  > completely independent files defining non-overlapping
>  > components in the same namespace that do not reference each
>  > other. Is that legal?
>  >
>  >       Thanks,
>  >
>  >               Yaron
>  >
>  > Martin Gudgin wrote:
>  >
>  > > If A includes B then when C imports A all the constructs in A and B
>  > > are visible to C. The text you refer to in Section 4.2 is about
>  > > components that A *imports* not components that A *includes*
>  > >
>  > > Gudge
>  > >
>  > >  > -----Original Message-----
>  > >  > From: Yaron Y. Goland [mailto:ygoland@bea.com]  > Sent: 12 March
>  > > 2004 00:28  > To: Martin Gudgin  > Cc: paul.downey@bt.com;
>  > > ryman@ca.ibm.com; www-ws-desc@w3.org  > Subject: Re: WSDL
>  > > Import/Include Locations  >  > Let's say that I have a WSDL
>  > namespace
>  > > FOO that is defined by  > two different files, A & B. File
>  > A includes
>  > > File B.
>  > >  >
>  > >  > I now want to define a WSDL namespace BAR and I want to use  >
>  > > components that are defined in namespace FOO. One could  >
>  > imagine an
>  > > import statement of the form:
>  > >  > <import namespace="f:oo" location="http://example.com/fileA"/>
>  > >  >
>  > >  > However it turns out that one of the components I want to  >
>  > > directly reference is defined in file B. Per section 4.2 in
>  >  > part 1
>  > > I can't directly reference any components in file B  >
>  > unless I import
>  > > it. So now the import section will have:
>  > >  > <import namespace="f:oo" location="http://example.com/fileA"/>
>  > >  > <import namespace="f:oo" location="http://example.com/fileB"/>
>  > >  >
>  > >  > Gudge, if I understood your previous point you suggested
>  > that  > if
>  > > one had multiple imports for the same namespace then one  >
>  > was free
>  > > to assume that they all point to the same infoset  > content and so
>  > > one only needed to download one of the links,  > thus
>  > neatly solving
>  > > my concerns about redundancy and performance.
>  > >  >
>  > >  > But in this case that assumption doesn't appear safe. If the  >
>  > > WSDL system only imported one of the two links from
>  > namespace  > FOO
>  > > then the WSDL definition would miss referenced components 
>  > > and fail.
>  > >  >
>  > >  > Did I miss something?
>  > >  >
>  > >  >       Thanks,
>  > >  >               Yaron
>  > >  >
>  > >  > paul.downey@bt.com wrote:
>  > >  >
>  > >  > > Yaron wrote:
>  > >  > >
>  > >  > >  > One of the most common ways to deal with
>  > unreliability is to 
>  > > >  > > make information available at distinct locations. The  >
>  > > advantage  > of  > > making data available at distinct locations is
>  > > that not  > only  > does  > > it provide reliability in the face of
>  > > endpoint failures but  >  > it also  > > provides
>  > reliability in the
>  > > face of network failures.
>  > >  > >
>  > >  > >
>  > >  > > Actually something that has always bothered me about
>  > the /list/
>  > > of  > > locations in WSDL and Schema: if a list is essential for
>  > > resilience  > > then why don't other languages such as HTML have a
>  > > list of  > URIs in <a  > > href='...' etc ?
>  > >  > >
>  > >  > > My most common use for the location /list/ has been to
>  > put  > a
>  > > relative  > > URI followed by an absolute URI so that a the same
>  > > schema  > may be used  > > stand alone on my laptop before being
>  > > deployed on Web  > server elsewhere.
>  > >  > >
>  > >  > > Also I have a competing use-case against multiple locations:
>  > >  > >
>  > >  > > Before signing-off a Web service we test and validate
>  > the service 
>  > > > > using a captured WSDL saved in our configuration
>  > management system.
>  > >  > > This captured WSDL must be stand-alone since the WSDL  >
>  > > published by the  > > service may later change. It's also not
>  > > acceptable to store a  > > serialisation of the infoset
>  > since this is
>  > > different to the actual  > > published documents, invalidating any
>  > > regression testing.
>  > >  > >
>  > >  > > So we need to not only store the root document but any
>  >  > other
>  > > documents  > > referenced and in a way that emulates how a
>  > consumer of
>  > > the actual  > > service would have worked.
>  > >  > >
>  > >  > > Here the redundancy lists only exasperates this difficulty  >
>  > > since it is  > > possible for two test runs to be presented
>  > with two 
>  > > > different sets of  > > documents.
>  > >  > >
>  > >  > > So a single URI in the location and having some  >
>  > environmental
>  > > change  > > to subvert the processor (a 'PATH' variable)
>  > would be  >
>  > > preferable here  > > rather than having to edit the WSDL
>  > document and
>  > > invalidate our  > > regression tests.
>  > >  > >
>  > >  > > Just another reason why concentrating on what is a
>  > valid  > WSDL
>  > > document  > > rather than the behaviour of a WSDL
>  > processors is very
>  > > useful.
>  > >  > >
>  > >  > > Paul
>  > >  > >
>  > >  > > --
>  > >  > > Paul Sumner Downey
>  > >  > > Web Services Integration
>  > >  > > BT Exact
>  > >  > >
>  > >  > >
>  > >  >
>  > >
>  >
>
Received on Wednesday, 17 March 2004 22:54:27 UTC