[Bug 4437] Error for xsi: schema location to appear too late from bugzilla@wiggum.w3.org on 2007-04-06 (www-xml-schema-comments@w3.org from April to June 2007)

From: <bugzilla@wiggum.w3.org>
Date: Fri, 06 Apr 2007 13:44:32 +0000
To: www-xml-schema-comments@w3.org
CC:
Message-Id: <E1HZojw-00072O-2b@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=4437





------- Comment #1 from noah_mendelsohn@us.ibm.com  2007-04-06 13:44 -------
> It's not clear what value this rule brings, but it
> does prohibit certain usage of schema (where
> different parts of a document uses different
> portions of components from the same namespace)
> and certain schema processing modes (where schemas
> are pre-built and cached and xsi: schema location
> attributes are ignored).

As I recall, the reasoning is along these lines:

We want to facilitate the construction of streaming processors.  If we allow
schemaLocations after first use of the namespace, then either a streaming
processor is forced to backtrack and undo any results that may have been
affected by newly acquired components, or to be deficient as seen from the
outside, in that there are legal schema/instance pairs that it cannot handle. 
In order to avoid having streaming processors appear to be deficient in such a
way, we restrict use of the schemaLocation mechanisms to patterns that do not
subvert streaming.

Note that support of such late arriving schemaLocations might be complex even
for more traditional processors.  Imagine, for example, a schemaLocation on the
last element of a document, with a definition for a namespace used early.  To
support this, the processor would effectively have to do a prepass on the whole
document to look for such schemaLocations, then assemble a schema, then
validate.  Of course, this is just another way of saying that even processors
that are willing to buffer an entire input document may find it convenient to
do some things in a streaming mode, so maybe it's not a completely separate
argument. 

I can see that this rule is not the only sensible choice to make, but it was a
decision made back in the Schema 1.0 days with (I think) pretty full awareness
of the tradeoffs.  I'm not aware of any new information that would merit
reconsidering now, or why this is prioritized high.  Furthermore, even if the
decision is suboptimal, changing it would raise questions for those who want to
adapt existing 1.0 processors as the basis for their 1.1 implementations.  They
would have to revisit their current code that throws errors, decide what to do,
perhaps change documentation, etc.  I'm not convincee we made the perfect
decision in 1.0, but I feel fairly strongly that we should not change it now.

Noah

Received on Friday, 6 April 2007 13:55:52 UTC