RE: Versioning of XML Schema and namespaces

Hi All,

First of all thanks to all those quick and informative responses.  Secondly,
sorry for getting my syntax wrong.  I should have used "schemalocation"
instead of "namespace".  Thanks to Eliot for sorting this out for me.  ;--)

Eliot mentions:

> But, a processor could use other 
> heuristics to decide
> which version to use, for example, looking in the document 
> for other clues
> or using some outside information, such as metadata held in a document
> management system.
> 

The problem is that the XML document instances will be ISO 19139 metadata.
They are the only clues that I will have.  ISO 19139 will be extensible and I
expect that agencies will extend the XSDs to suit their own needs.  I need to
know that there is a difference and therefore I can track down the true
values.

I found that the first draft of ISO 19139 was not valid.  It hopefully will
be valid in the final release.  The other problem is that I don't have
control over how ISO 19139 identifies its schemalocation.  Can I trust that
an International Standard will have versions in its schema locations?  I
don't think so if it isn't valid in the first place.  What happens when there
are changes the XML Schema?  ;--)

Paul writes:

> Depending on circumstances, it can be very dangerous to 
> trust the sender/author to tell you what DTD/schema to use 
> to validate against.  After all, one of the main reasons 
> to perform validation is because you don't trust the 
> sender/author...so why would you trust them when they 
> tell you what DTD/schema to use?

I have found that many people don't understand how to properly use XML.  They
have tools that generate XML and trust that it is right.  I have found that
if I try to validate that XML I find it is not valid.  The users don't
understand why.  They have relied on the vendor's statements which may not be
100% true.  For example, Xerces literally interprets the W3C XML Schema
whereas XML Spy seems to have a more lax interpretation.  Many people have
said that their XML validated using XML Spy but when I use Xerces it isn't
valid.  No-one seems to be telling the vendors that their products are not
right.

I also find that some W3C XML Schema are not valid according to Xerces.  If
the XSD is not valid then how can I expect to validate the XML documents
which use that XSD?  Hopefully I won't have to use these problematic XSDs but
the problem still may occur.

When an organisation extends the ISO 19139 metadata XSD I need to check to
see if they have got it right.  But I don't know about the new XSDs unless
they are referenced in the XML document instances which I download each
quarter.  I can then locate and validate the XSD which are referenced in the
schemalocation in the XML documents and then validate the XML document
instances themselves.  I then report my findings to the managers of those
metadata entries.  It's not that I can't trust the owners of the documents.
It's that they don't know the right way to do things so I tell them when they
are wrong so that they can fix the problem.

I have also found that they may have relative URIs for the schema locations
in their XML document instances.  This makes everything difficult to solve.
I have to contact the owner of the document and tell them to use absolute
schema locations so that I can validate their XML document or track down a
new XSD.

I have found that if the XML documents are not valid then searches on those
documents will not return appropriate results.  For example, the date format
of the XML document instances are supposed to be ISO 8601 compliant yet some
content still looks like "27-Jan-2005" or some other non-ISO 8601 format.
The indexes of these fields don't check the format but when a search for
dates between "2005 and 2006" are sent to these indexes, the document
containing a date of "27-Jan-2005" is not returned.  Hence the search tool is
not effective because of the XML content.  This is why I have to validate the
XML documents that are made available via our metadata search tool.   I can't
trust that the XML document nor the XSDs are correct but I can if I validate
them myself.

I have no control over some of the schemalocation definitions.  For example,
xlink, GML, ISO 19139 and any extensions of the latter.  

Eliot writes:

> It's important to remember that there is (and never was) any particular 
> magic to PUBLIC identifiers--they are just magic strings that require 
> indirection to be resolved. In that respect that are indistinguishable 
> for URIs that also require indirection to be resolved to real resources.

I know that Public Identifiers are not magic but for some reason it seems
that Public Identifiers are more likely to be genuinely unique for different
versions of XSDs than URIs and therefore I would like to see a similar
solution.

Thanks again for prompt replies and the helpful discussions.


John
 
> -----Original Message-----
> From: Hans Teijgeler [mailto:hans.teijgeler@quicknet.nl] 
> Sent: Thursday, 5 May 2005 8:53 AM
> To: 'Hans Teijgeler'; 'Eliot Kimber'; Hockaday John; 
> xmlschema-dev@w3.org
> Cc: ":www-xml-schema-comments"@w3.org
> Subject: RE: Versioning of XML Schema and namespaces
> 
> 
> I don't know what software is used for this forum. My neatly arranged
> listings got garbled. Still readable, I hope.
> Hans
> 
> -----Original Message-----
> From: xmlschema-dev-request@w3.org 
> [mailto:xmlschema-dev-request@w3.org] On
> Behalf Of Hans Teijgeler
> Sent: donderdag 5 mei 2005 0:25
> To: 'Eliot Kimber'; John.Hockaday@ga.gov.au; xmlschema-dev@w3.org
> Subject: RE: Versioning of XML Schema and namespaces
> 
> 
> Hi John,
> 
> My two cents of wisdom. In ISO 15926-7 we have defined nine 
> interlinked XML
> Schemas, the top one, for the data ISO 15926-2 data model, 
> starting with:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema 
> 	xmlns:xs="http://www.w3.org/2001/XMLSchema" 
> 	
> xmlns="http://www.tc184-sc4.org/iso15926-7/datamodel/2005-1.xsd" 
> 	
> targetNamespace="http://www.tc184-sc4.org/iso15926-7/datamodel
> /2005-1.xsd" 
> 	elementFormDefault="qualified" 
> attributeFormDefault="unqualified">
> 	...
> 
> This is imported in the next level schema, for Templates:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema 
> 	
> xmlns="http://www.tc184-sc4.org/iso15926-7/templates/2005-1.xsd" 
> 	
> xmlns:p2="http://www.tc184-sc4.org/iso15926-7/datamodel/2005-1.xsd" 
> 	xmlns:xs="http://www.w3.org/2001/XMLSchema" 
> 	
> targetNamespace="http://www.tc184-sc4.org/iso15926-7/templates
> /2005-1.xsd" 
> 	elementFormDefault="qualified" 
> attributeFormDefault="unqualified">
> 	<xs:import
> namespace="http://www.tc184-sc4.org/iso15926-7/datamodel/2005-1.xsd" 
> 	
> schemaLocation="http://www.tc184-sc4.org/iso15926-7/datamodel/
> 2005-1.xsd"/>
> 	...
> 
> and this is imported again in the next level, for Object 
> Information Models:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema 
> 	xmlns="http://www.tc184-sc4.org/iso15926-7/oim/2005-1.xsd" 
> 	
> xmlns:p2="http://www.tc184-sc4.org/iso15926-7/datamodel/2005-1.xsd" 
> 	
> xmlns:templ="http://www.tc184-sc4.org/iso15926-7/templates/200
> 5-1.xsd" 
> 	xmlns:xs="http://www.w3.org/2001/XMLSchema" 
> 	
> targetNamespace="http://www.tc184-sc4.org/iso15926-7/oim/2005-1.xsd"
> 
> 	elementFormDefault="qualified" 
> attributeFormDefault="unqualified">
> 	<xs:import
> namespace="http://www.tc184-sc4.org/iso15926-7/datamodel/2005-1.xsd" 
> 	
> schemaLocation="http://www.tc184-sc4.org/iso15926-7/datamodel/
> 2005-1.xsd"/>
> 	<xs:import
> namespace="http://www.tc184-sc4.org/iso15926-7/templates/2005-1.xsd" 
> 	
> schemaLocation="http://www.tc184-sc4.org/iso15926-7/templates/
> 2005-1.xsd"/>
> 
> etc, etc.
> 
> The targetNamespace contains (here) the suffix 2005-1, 
> meaning that it is
> the first version of the year 2005. The schemas are (in due 
> time) located on
> a publicly accessible web server.
> 
> I hope this helps somewhat.
> 
> Regards,
> Hans
>  
> 
> -----Original Message-----
> From: xmlschema-dev-request@w3.org 
> [mailto:xmlschema-dev-request@w3.org] On
> Behalf Of Eliot Kimber
> Sent: woensdag 4 mei 2005 17:11
> To: John.Hockaday@ga.gov.au; xmlschema-dev@w3.org
> Cc: ":www-xml-schema-comments"@w3.org
> Subject: Re: Versioning of XML Schema and namespaces
> 
> 
> John.Hockaday@ga.gov.au wrote:
> > I expect that document instances using W3C XML Schemas will use a
> "namespace"
> > declaration to identify which XML Schema should be used to validate 
> > that document instance.  The problem that I see with the 
> namespace it 
> > that a URI is the unique identifier.  There is no PUBLIC identifier.
> > As we have all probably experienced with old bookmarks, the 
> content at
> URLs change a lot.
> > If an XML Schema's version is not part of the URI and a new 
> version of 
> > that XML Schema is made then it is likely that this will *not* be 
> > reflected in the URI and hence the namespace.
> 
> I think you're confusing two similar but distinct functions here: 
> namespaces and schema locations. DTD declarations with 
> external identifiers
> are equivalent to schemaLocation specifications, not namespace
> declarations--SGML (and XML without namespaces) had no equivalent of
> namespaces [except for HyTime architectures].
> 
> For DTDs, which are a syntactic part of the document that 
> references them,
> the PUBLIC identifier is nothing more than an alias for the 
> external DTD
> subset's storage location (i.e., it's filename). Thus it is completely
> appropriate that it include a version identifier since if the external
> declaration subset is changed it's a new object and should be 
> identified as
> such.
> 
> By contrast, a document's namespace *does not* directly 
> identify a schema.
> It identifies (or rather, can be exclusively associated with) an
> (abstract) "application" that might have any number of 
> schemas associated
> with it. That is, for a given application, with a single associated
> namespace, there might be different schemas *at the same 
> time*, reflecting
> different profiles or uses of the application, or there might 
> be different
> schema versions over time reflecting changes over time to the 
> details of the
> namespace. But the namespace itself is unchanged because the namespace
> identifies the application independent of it's various 
> implementations over
> time. [For example, the XSLT namespace is invariant across 
> versions of the
> XSLT spec because, as an abstract application, XSLT is XSLT 
> regardless of
> the currently-defined details of it.]
> 
> In DTD-based documents that use external declaration subsets 
> you always have
> to have an external identifier for the subset, so you always 
> had something
> you could resolve or use in a catalog.
> 
> For non-DTD-based documents, there are two possible cases (assuming
> namespaced documents--the no-namespace case is degenerate and 
> not worth
> considering because it allows no good general solution):
> 
> 1. The document uses the schemaLocation= hint to say which 
> specific schema
> it wants you to use.
> 
> 2. The document specifies only a namespace.
> 
> In the first case, the schema location can either be local, 
> relative URI or
> it can be an absolute URI. In this case of the absolute URI, the URI
> functions essentially as a PUBLIC ID does: it essentially 
> demands local
> mapping to a local resource via some sort of catalog method 
> (for the simple
> practical reason that most processing environments aren't always net
> connected or because the schema is not in fact served on a
> publicly-available server). If the absolute URI includes some sort of
> version value, then you have *exactly the same* functionality 
> and semantics
> as with PUBLIC IDs for external DTD subsets.
> 
> In the second case, the implication is that the system must 
> determine which
> version of the schema to use, which typically would be done 
> using a catalog
> and probably implies that in most cases you want the latest 
> or more general
> version of the schema. But, a processor could use other 
> heuristics to decide
> which version to use, for example, looking in the document 
> for other clues
> or using some outside information, such as metadata held in a document
> management system.
> 
> Thus, I think the appropriate approach in your case is to 
> require the use of
> schemaLocation= with absolute URIs that include version 
> information--that
> gives you the same control you had before.
> 
> It's important to remember that there is (and never was) any 
> particular
> magic to PUBLIC identifiers--they are just magic strings that require
> indirection to be resolved. In that respect that are 
> indistinguishable for
> URIs that also require indirection to be resolved to real resources.
> 
> Cheers,
> 
> Eliot
> 
> --
> W. Eliot Kimber
> Professional Services
> Innodata Isogen
> 9390 Research Blvd, #410
> Austin, TX 78759
> (512) 372-8155
> 
> ekimber@innodata-isogen.com
> www.innodata-isogen.com
> 
> 
> 
> 

Received on Friday, 6 May 2005 04:35:21 UTC