Comments on Section 6.3/(Locating Schema resources) from Curt Arnold on 2000-05-11 (www-xml-schema-comments@w3.org from April to June 2000)

From: Curt Arnold <carnold@houston.rr.com>
Date: Thu, 11 May 2000 01:19:19 -0500
To: <www-xml-schema-comments@w3.org>
Cc: <xml-dev@xml.org>
Message-ID: <004101bfbb10$d765b120$0d44a018@houston.rr.com>
Section 6.3.2: Point 4

Having declarations of schemaLocations anywhere in the document having
document scope, of course, seriously complicates event-based validation.  It
would seem reasonable to require that schemaLocations appear before the need
for the corresponding schema information.

There would also be the need for some sort of statement when inconsistent
schemaLocations appear for the same namespace.  I would assume that the
first would take precedence.

==================

In general, I think locating schema resources has a couple of serious
deficiencies.

First, there is not a one-to-one correspondence between namespaces and
schemas.   For example, the XHTML namespace has three distinct DTD's
associated with it which are distinguished using public identifiers.  There
may also be successive versions of schemas for the same namespace.

Second, a single schema resource may contain many distinct(possibly tens if
not hundreds) namespaces through inclusions. I believe the typically usage
would be to have a single schema resource that would contain definitions for
all the expected namespaces and then, occassionally, one or more additional
schema resources for unanticipated namespaces.  Having to enumerate all the
namespaces that appear in a mega resource would get very long and prone to
error.

Third, there is not a conflict resolution mechanism when a namespace has
multiple schema locations are declared either implicitly (through an import
within a schema) or explicitly through a schemaLocation attribute.

Fourth, there is not a mechanism to identify a schema resource to be used to
validate an XML 1.0 (pre-namespace) compatible document.

It would seem the best approach would be to use public identifiers
(fortunately having a rebirth of interest on xml-dev) to explicitly identify
a specific schema resource instead of relying on a ambiguous combination of
namespace and namespaceLocation to resolve whether a particular cached
version of a schema is appropriate.

What I would suggest is that:

1. schema element have a targetPublicIdentifier attribute in addition to a
targetNamespace.
2. xsi:schemaPublic be a list of public identiers
3. xsi:schemaSystem be a list of URI's
4. xsi:defaultNamespace would identify the namespace to be used for
unqualified elements. (see note)
5. A publicIdentifier simple type be added to the built-in datatypes.
6. The use of a processing instruction as an alternative mechanism for
specifying schema resources for XML 1.0 compatible documents.  Since these
would not involve multiple namespaces or resources, I'd recommend something
like the following to appear before the document element:

<?xsi:schema defaultNamespace PUBLIC pubid sysid ?>
<?xsi:schema defaultNamespace SYSTEM sysid ?>

When xsi:schemaPublic and xsi:schemaSystem appear on the same element, there
must be a one to one correspondence between entries, so that if the second
public identifier cannot be resolved, the second URI could be used to
retrieve the resource.  I'm assuming that there can be an acceptible
mechanism for representing a null public identifier and a null URI.

When schema information appears for one namespace in multiple schema
resources, the first appearance would be used for validation.

Note on defaultNamespace:
A previous comment
(http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000AprJun/0124
.html) discussed Section 3.4 on Undeclared target namespaces.  The current
treatment would require one schema for XHTML, for example, for documents
where the namespace was declared and another schema for XHTML where the
namespace was not declared.  To me, it seems that the creation of a schema
implies that you are defining elements that are in a conceptual namespace
and that the additional burden of picking out a URI for this namespace is
minimal.  If a instance document doesn't want to use an xmlns attribute,
that is a usage issue that could be addressed by a xsi:defaultNamespace
attribute (or schema PI) that provides a weaker binding of unqualified names
to a namespace that is only used for schema validation.
Received on Thursday, 11 May 2000 02:19:41 UTC