LC-117: Locating Schema resources

Dear Curt, 

The W3C XML Schema Working Group has spent the last several months 
working through the comments received from the public on the last-call 
draft of the XML Schema specification.  We thank you for the comments 
you made on our specification during our last-call comment period, and 
want to make sure you know that all comments received during the 
last-call comment period have been recorded in our last-call issues 
list (http://www.w3.org/2000/05/12-xmlschema-lcissues). 

Among other issues, you raised several concerns relating to the 
mechanisms provided for identifying schema documents and 
for associating them with namespaces to be validated.
For tracking purposes, the workgroup has identified these
collectively as last call issue LC-117.

Your primary suggestion was that we adopt public and 
system identifiers to identify schemas to be used in 
validation.  The workgroup gave serious consideration 
to these mechanisms, both before and after receiving 
your note.  We decided that, on balance, our existing 
mechanisms meet the needs of a broader range of 
important applications.  We have therefore decided 
to retain our design as proposed.

In your note, you raise some specific concerns on 
which I can provide you more information.  Specifically, 
you express the concerns that:

>>  there is not a mechanism to identify a schema 
>>  resource to be used to validate an XML 1.0 
>>  (pre-namespace) compatible document. 

In fact, as described in section 6.3.2 of the 
structures specification, the noNamespaceSchemaLocation 
attribute is provided for exactly this purpose.
Also, processors can be written to allow
invoking applications to specify the schema
document to be used, to deal with situations
in which you want the receiving application
rather than the instance author to control the
schema to be used for validation.

>> a single schema resource may contain many 
>> distinct (possibly tens if not hundreds) 
>> namespaces through inclusions. I believe the 
>> typical usage would be to have a single schema 
>> resource that would contain definitions for 
>> all the expected namespaces and then, occassionally, 
>> one or more additional schema resources for 
>> unanticipated namespaces. Having to enumerate 
>> all the namespaces that appear in a mega 
>> resource would get very long and prone to error. 

There is no need to explicitly import (since you are 
talking about multiple namespaces, I presume you meant 
import rather than include) namespace B into the schema 
for namespace A, except in the case where constructions 
from B are explicitly and directly used in creating 
declarations for A.  If B in turn uses constructions 
from a namespace C, then the schema for B must import C, 
but in general the schema for A need not.  Similarly, 
if the schema for A had a content model with a wild-card, 
then one might validate a document using both namespace A 
and other namespaces with no explicit imports at all.

>> there is not a conflict resolution mechanism when a 
>> namespace has multiple schema locations are declared 
>> either implicitly (through an import within a schema) 
>> or explicitly through a schemaLocation attribute. 

The specification makes clear that schemaLocations are 
hints, on imports and in instance documents.  Processors 
are therefore free to use either of the hints, or to ignore 
both. 

In fact, the use of attributes for schemaLocations is in 
some sense an accommodation to the lack of extensibility 
of XML itself.  In principle, it would be attractive to have 
some extensible document level hook for providing structured 
metadata pertinent to the document as a whole.  Were such a 
mechanism available, then we could use it to establish 
document wide associations between namespaces and schema 
documents.  Lacking that capability, and since it 
is a goal for us to be able to validate individual 
elements (and their children), it is desirable to 
allow schema locations to be signaled at any point in 
a document tree.  We thus adopted the schemaLocation attribute, 
but our decision to use an attribute raises the admittedly 
undesirable possibility that conflicting schema locations 
will be encountered for the same namespace.  I can assure 
you that this choice and many others relating to location of
schema documents were debated with great care and over an 
extended period of time.

>> there is not a one-to-one correspondence between 
>> namespaces and schemas. For example, the XHTML 
>> namespace has three distinct DTD's associated 
>> with it which are distinguished using public 
>> identifiers. There may also be successive versions 
>> of schemas for the same namespace. 

Right.  That is exactly why we provide the flexibility 
that we do.  Here are some of the ways you can use our 
design to pick the particular schema you want for the 
XHTML namespace.  If you want the instance document 
author to have control over the expected "flavor" of 
XHMTL, then s(he) should include a schemaLocation 
attribute identifying the actual XHMTL schema document 
to be used.   There may however be other situations 
in which the receiving application needs to assert the 
flavor of XHTML (or especially of some expected purchase 
order format) that it accepts.  For example, a particular 
HTML editor may be written to accept the transitional 
XHMTL variant.  For exactly reasons of this sort, we 
allow the application and processor together to determine 
the schema to be used for any particular namespace. 
Such an application can validate and edit an XHMTL document 
that conforms to the transitional subset, whether or not 
the instance uses schemaLocation at all, and perhaps also 
in a situation where the document claims to be strict, but 
the application also accepts the enhanced variant. 

In any case, the receiving application is in control, 
and can certainly decline to process documents with 
schemaLocation values that it considers unacceptable 
(I.e. situations in which the document author has 
explicitly signaled use of a variant that is not 
acceptable).  Also, keep in mind that especially 
for certain high-performance e-commerce applications, 
it is not acceptable from a performance or security 
point of view to force the receiving application (of 
the purchase order for example) to even attempt a 
connection back to arbitrary web sites that may have 
been identified in schemaLocation attributes in an 
instance.  That is among the reasons that schemaLocation 
is, formally, a hint that can be ignored.


I hope the above is useful in explaining our approaches 
to the several important issues that you raise, and also 
in giving you a sense of some of the broader issues that 
we considered in arriving at the design we chose.

It would be helpful to us to know whether you are satisfied with the 
decision taken by the WG on this issue, or wish your dissent from the 
WG's decision to be recorded for consideration by the Director of 
the W3C. 

Thank you very much for your interest in our work. 

Sincerely, 
Noah Mendelsohn

------------------------------------------------------------------------
Noah Mendelsohn                                    Voice: 1-617-693-4036
Lotus Development Corp.                            Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------------

Received on Friday, 22 September 2000 00:06:46 UTC