Re: LC-117: Locating Schema resources

This is in response to
http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000JulSep/0209.
html on XML Schema last call issue LC-117
http://www.w3.org/2000/05/12-xmlschema-lcissues#locating-schema-resources.

Noah Mendelsohn wrote:

> >>  there is not a mechanism to identify a schema
> >>  resource to be used to validate an XML 1.0
> >>  (pre-namespace) compatible document.
>
> In fact, as described in section 6.3.2 of the
> structures specification, the noNamespaceSchemaLocation
> attribute is provided for exactly this purpose.

Using xsi:noNamespaceSchemaLocation to specify a schema location hint would
cause the document to be invalid according to a pre-existing DTD.  The only
way that I could see that a document could declare that it adheres to a
specific schema while retaining validity to an existing DTD would be through
use of a processing instruction.
>
> There is no need to explicitly import (since you are
> talking about multiple namespaces, I presume you meant
> import rather than include) namespace B into the schema
> for namespace A, except in the case where constructions
> from B are explicitly and directly used in creating
> declarations for A.

I did mean import (sorry) and I was precisely talking about instances where
constructions from multiple namespaces are used in creating a schema
resource that combines elements from multiple namespaces.

For example, if I have an "http://www.example.com/namespace/automobile"
namespace that imports distinct namespaces for major subsystems
("..drivetrain", "..engine", "..interior", "..frame", "..suspension",
"..tires", "..transport", "..finance", etc) plus generic namespaces like
XHTML and uses elements from these distinct namespaces to describe the
overall picture of a car.  Each individual namespace may be owned by a
different organization (either an internal division or WG or an external
organization) that may independently revised.

When I assume version "1.0" of the schema for "../automobile", I specify
through my <xsd:import>'s specific hints that indicate that version "1.0" of
this namespace uses elements from version "1.15" of "drivetrain", "1.7" of
"engine", "1.3" of interior, etc.

<xsd:schema targetNamespace="http://www.example.com/namespace/automobile">
    <xsd:import namespace="http://www.example.com/namespace/drivetrain"
schemaLocation="http://www.example.com/schemas/drivetrain/115"/>
    <xsd:import.../>
    <xsd:import.../>
    <xsd:element name="automobile">
        <xsd:complexType>
            <xsd:element ref="drivetrain:transmission"/>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>

For me to use schemaLocation to specify that this schema resource should be
used to validate its 15 or so imported namespaces, I have to do something
like:

<automobile xmlns="http://www.example.com/namespace/automobile"

xmlns:drivetrain="http://www.example.com/namespace/drivetrain"
                    xmlns:engine="http://www.example.com/namespace/engine"

xsi:schemaLocation="http://www.example.com/namespace/automobile
http://www.example.com/schema/automobile10.xsd
http://www.example.com/namespace/drivertrain
http://www.example.com/schema/automobile10.xsd
http://www.example.com/namespace/engine
http://www.example.com/schema/automobile10.xsd
....
>

In general such a schemaLocation would be prone to error (omitting or
mistyping one or more namespaces), that it would typically be omitted.
However, at that point, you effectively lost the ability of the document to
assert (though it could always be lying) that it was valid against a
specific schema resource.

>The specification makes clear that schemaLocations are
>hints, on imports and in instance documents.  Processors
>are therefore free to use either of the hints, or to ignore
>both.

The tension comes from a dual nature of schemaLocation.  It tries to act
both as a retrieval location for a resource (a file:// URL that an XML
editor validates during development) and an assertion of conformance to a
known universally identified schema resource (a URN or http:// URL that
would rarely be dereferenced) that a server would typically ignore before
validating with a schema resource of its choosing.

Definitely, the assertion of arbitrary conformance but verification against
expections scenario is an extremely important scenario that you had to play
dirty parser tricks to accomplish with DTD's.  However, most users
expectations when they hear "schemaLocation" would be to expect DTD-like
behavior.  Renaming "schemaLocation" to "schemaIdentifier" might clarify
expectations.  That might minimize the need to say, "it is only a hint".  It
is an identifier whose optional, last-ditch resolution mechanism is
dereferencing.

I had in my original message has suggested use of system identifiers,
however I understand now that use of URI's is preferred for compatibility
with RDF among other things.

Based on your message, I going state that I'm not satisfied with the
proposed resolution (basically to leave schema resource location
substantially unchanged with the exception of LC-116) in that there are
substantial use cases (XML 1.0 compatible documents and documents where a
large number of namespaces are governed by a small set of schema resources)
where the suggested mechanism of asserting schema conformance is
unsatisfactory to the point that alternative solutions will be used.

I would recommend (in order of significance):

1. Renaming xsi:schemaLocation to xsi:schemaIdentifier (or maybe
xsi:assert).  Tools that want to check the assertion may (but are not
required) to dereference the URI if they cannot resolve the identifier by
another mechanism.  Primer and other text should suggest that
xsi:schemaIdentifier should be a URN or at least an absolute http: URL.
2. Recasting "locating schema resources" sections to "asserting schema
conformance" or similar wording.
3. (Optional) Add back xsi:schemaLocation (or call it something different,
say xsi:map) but as pairs of schema identifiers (not namespaces) and
retrieval URL's that can be used in resolving schema identifiers.  For
example (ignoring my issues with using NS-Schema pairs for the moment)

<instance xmlns="http://www.example.com/ns1"
                xsi:schemaIdentifier="http://www.example.com/ns1
http://www.example.com/schema1.xsd http://www.example.com/ns2
http://www.example.com/schema1.xsd"
                xsi:schemaLocation="http://www.example.com/schema1.xsd
file://c|/myschemas/schema1.xsd"
>
...
</instance>

"xsi:schemaLocation" attributes would be expected to be vanishingly rare in
production systems and would be ignored by default, but could be used during
development.

4. (Optional) Define a processing instruction that XML 1.0 compatible
documents can use to assert conformance to a schema without compromising
their validity with an existing DTD.  Something like ([] indicate optional
sections):

<?xsi:assert [ns="http://www.example.com/ns1"]
schemaIdentifier="http://www.example.com/schema1.xsd"
[schemaLocation="file://c|/myschemas/schema1.xsd"] ?>

Which would be interpreted as asserting the document conforms the schema
identified as http://.../schema1.xsd that has some probability of being
found a c:/myschemas/schema1.xsd when http://www.example.com/ns1 is used for
unqualified element names.  (ns="" might be a little over the top).  An XSLT
transform could use the info in the processing instruction to convert the
document to a namespaced XML document with good, old xsi:schemaIdentifier
attributes.

5. Provide an mechanism to declare that a schema identifier covers multiple
namespaces.  This could be as simple as making the odd position elements in
xsi:schemaIdentifier either a namespace URI or "##any" or "##other".  Our
automotive example could then be:

<automobile xmlns="http://www.example.com/namespace/automobile"

xmlns:drivetrain="http://www.example.com/namespace/drivetrain"
                    xmlns:engine="http://www.example.com/namespace/engine"
                    xsi:schemaIdentifier="##any
http://www.example.com/schema/automobile10.xsd">

Received on Friday, 22 September 2000 02:54:45 UTC