Re: Versioning of XML Schema and namespaces from Eliot Kimber on 2005-05-10 (xmlschema-dev@w3.org from May 2005)

From: Eliot Kimber <ekimber@innodata-isogen.com>
Date: Tue, 10 May 2005 09:26:51 -0500
To: John.Hockaday@ga.gov.au
CC: xmlschema-dev@w3.org
Message-ID: <4280C4AB.7000305@innodata-isogen.com>
John.Hockaday@ga.gov.au wrote:
> Hi all,
> 
> I am now more confused.  Should an XML document instance have a namespace?  I
> thought Eliot indicated that I should be using schemaLocation?  What should
> an XML document instance that tells you what XSD to validate it against look
> like?  I guess it must start with an XML declaration:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> then a nameSpace or schemaLocation to tell everyone what XSD it should comply
> to:
> then the root element:
> <root>
>  <all the other elements, attributes and values that suit the XSD />
> </root>
> 
> Does it have to identify every XSD that it uses?  Even those that are
> imported by the child XSD that is used for this XML document instance?
> Doesn't the "import" element in the child cater for that?  If I need to
> include a namespace statement for every single XSD that it references that
> that will take up many lines.

I think you're making it more complicated than it is. To answer your 
practical question: documents only need to point to the top-level 
schemas for the namespaces they use--they do not need to also point to 
schemas imported by the top-level schemas.

For example, if I define schema a1 for namespace A, and schema a1 
imports elements from schema b1 for namespace B, and my document is in 
namespace A, I only need to point to schema a1, not a1 and b1, because 
schema a1 already points to schema b1.

One possible point of confusion between namespace/schema world and the 
DTD/no-namespace world is that with namespaces and schemas you can say 
*two* things, but with DTDs you can only say *one* thing. With 
namespaces and schemas you can say both:

- What abstract "application" the document applies to. This is indicated 
by the namespace. There is no analog of this in DTD-land. People *tried* 
to use public identifiers in this way, but that's not what they were 
intended to mean. This declaration can be *reliably* used by processors 
to associate documents with both validation rules (schemas) and business 
rules (e.g., XSLTs, Java classes, import/export rules for content 
management systems, etc.).

- What specific schema instance you want to be used with a document, 
i.e., schemaLocation=. This is *exactly the same* as a DOCTYPE 
declaration, in that it establishes a link from the document to its 
governing syntactic constraint rules (the DTD declarations, the XSD 
schema). The only difference between schemas and DTDs is that with 
schemas this is a *logical* link that processors are free to ignore but 
with DTDs it is *syntactic* link that validating processors are 
obligated to resolve and process.

With DTDs, all you can say is what syntactic constraint rules to use. 
This tells you nothing *reliable* about what application the document 
participates in because there is nothing here that has the same 
reliability as a namespace. People have used PUBLIC IDs *as if* they 
were equivalent of namespace declarations, but they are not and cannot, 
in the general case, be reliably used as such.

In my opinion, all documents should be associated with at least one 
namespace, otherwise you have no reliable way, in the general case, to 
know what abstract application they are part of.

Whether or not to also use schemaLocation= to directly bind documents to 
schemas is an implementation question that depends on the needs of the 
business process in which the documents participate.

If you control the documents and they're processors it's probably 
convenient to use schemaLocation= because it requires the least setup 
and maintenance. If you don't control the documents or you don't control 
the processor, then as Mike points out, schemaLocation= is pointless 
*if* you have a business requirement to ensure that documents are not 
only valid against their associated schemas but valid against specific 
schemas that you specify. In that case schemaLocation= cannot be trusted 
and you have to do the namespace-to-schema mapping in the processor.

But if you either trust the document creators or you only care that the 
document is valid, but not which schema it's valid against, then 
schemaLocation= is fine.

But at the end of the day what really matters is the namespace--that's 
the one thing about the document you can grab hold of and trust, 
assuming the namespace is associated with some well-defined, well-known 
application (e.g., something published in a public standard or something 
controlled and maintained by an enterprise).

Cheers,

Eliot

-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8155

ekimber@innodata-isogen.com
www.innodata-isogen.com
Received on Tuesday, 10 May 2005 14:25:55 UTC