Re: Schematron schema for SOAP 1.1 Envelopes from Rick JELLIFFE on 2000-09-21 (xml-dist-app@w3.org from September 2000)

From: Rick JELLIFFE <ricko@geotempo.com>
Date: Thu, 21 Sep 2000 22:55:45 +0800
To: xml-dist-app@w3.org
Message-ID: <39CA2171.A3559D71@geotempo.com>
Henrik Frystyk Nielsen wrote:
 .
>  David Orchard wrote:
> > It's careless to make an assumption - namespaces are URIs for
> > the purpose of fetching schemas - and then claim it as fact.  
> > It has never been the intent
> > that applications can do a GET on the namespace URI to fetch a schema.
> 
> I didn't claim that you are guaranteed a schema - I said might - just as
> well as you might get HTML back when you go to some website. This is 
> what the NS spec states - you might or you might not. The same thing 
> goes with schemaLocation - you are not guaranteed a schema - that's just
> life. It certainly does not state that "it has never been the 
> intent...".

I followed the debate on XML Namespace with great interest on the XML IG
at the time.  Henrik is trying to rewrite history: the statement in the
spec that is it "not a goal that" the URI reference "be directly usable
for retrieval of a schema" clearly states that it is not the model or
expectation in XML Namespaces that applications will do a GET on the
namespace URI to fetch a schema.   (Perhaps Henryk can point to the part
of the Namespaces spec that we have missed, where is says that URIs are
used so we might retrieve something using them.)

There are many reasons why this should be so.  The proponents of
namespaces=schema have never tried to answer the reasons. Instead we get
this treat-namespaces-as-lucky-dip-then-everything-will-be-fine guff. 
The XML Schema WG have, after long consideration, tried to make a
workable approach with the schemaLocation attribute. 

The issue comes down to the purpose of namespaces. The XML Namespaces
spec makes it very clear in its motivation opening paragraph: where "a
single XML document may contain elements and attributes ... that are
defined for an used by multiple software modules. ...if such a markup
vocabulary exists which is well-understood and for which there is useful
software available, it is better to re-use this markup than reinvent
it."

So the key here is to maximize how much "well-understood" markup
vocabularies can be re-used.  Re-use does not require or imply
fine-grained schema consistancy: quite the reverse.  The vocabulary
is well understood and should be usable in different schemas even if
they impose additional criteria.  Identifying schema and namespace
reduces re-use: it encourages duplicate names for things that are the
same.

For example, if I have an HTML element p. The HTML content model for p
does not allow my element rick:dog.  But I want to have
 <rick:pet><html:p>hello <rick:dog>Rover</rick:dog></html:p></rick:pet>

where I disallow any contents of html:p apart from PCDATA and rick:dog
elements.  

This is still an html:p element: I give it the well-understood name
html:p because there is useful software available that can use it. It
has a very different content model than the content model in any of the
html DTDs, because content models do not adequately express the actual
semantics of the element: they miss out a step.  A paragraph can contain
allowed text and inlined and embedded objects; in typical HTML these
include many well-known elements; however, restricting away all of them
in  particular case is no reason for a different namespace name.  

Having to change the namespace name defeats the purpose of allowing
reuse. Instead of having robust processing of well-understood names, we
get software that has to understand zillions of names: every version
change or content model change to add something that the particular
schema language was incapable of modeling (or to try to redress some
constraint introduced as an artifact of the limitations of the schema
language) would require a new namespace.  

The idea that somehow our nice XML Schema software can trace through the
type derivation hierarchy and eventually come to well-understood
underlying names (and then figure out whether the derived type is
compatible or not) is bogus. First because of performance/download
issues. Second, because it is far more complex that a namespace-using
system must be able trace through XML Schemas than if the namespace
signified general semantics and the schemaLocation indicated the
particular schema applicable directly. 

Third, because only the application itself knows which information items
are essential to its operation and must be preserved: the schema that an
application is built to may be much simpler or more complex than the
schema that the data has: as we have no way of matching data schemas to
application schemas the "be generous in what you accept" rule is wise. 
A system which immediately barfs when unimportant schema violations
occur is fragile.

So what would it be better for the namespace URI reference to ultimately
locate?  Either a semantic schema or, better, a directory of related
resources discoverable by some conventions.  Namespace=schema blocks the
use of the namespace URI for more systematic and extensible purposes.

This issue could be defused if SOAP provided some convention to prevent
this blocking.  For example, if it said that the query "?request=schema"
should be appended to the namespace URI reference when attempting to
derefence it to get a schema.  This prevents hogging of the URL by
structural schemas, allows other queries by other specs which want other
resources based on derferencing the namespace URI, and the query will be
ignored by servers which just have a file (at least, the servers I
quickly tried ignored this.)



Rick Jelliffe
(Not speaking for employer)
Received on Thursday, 21 September 2000 10:41:21 UTC