FW: Best Practices for Establishing Namespace Name from Simon Cox on 2009-09-02 (xmlschema-dev@w3.org from September 2009)

From: Simon Cox <simon.cox@jrc.ec.europa.eu>
Date: Wed, 2 Sep 2009 20:08:59 +0200
To: <xmlschema-dev@w3.org>
Message-ID: <CEE7205EF0534E09B5D83E19B390CC8E@H07.jrc.it>
 
Also forwarded to the list as this is likely to be of general interest. 

-----Original Message-----
From: Simon Cox [mailto:simon.cox@jrc.ec.europa.eu] 
Sent: Wednesday, 2 September 2009 20:06
To: 'noah_mendelsohn@us.ibm.com'
Cc: 'Andrew Welch'; 'ekimber'; 'G. Ken Holman'; 'Henry S. Thompson'; 'Tsao,
Scott'
Subject: RE: Best Practices for Establishing Namespace Name

Not in general, but sometimes, and often enough to matter for certain
use-cases. 
A review of processing engines (including those built in to enterprise tools
like Oracle) a few years ago led us to the conclusion that the real tools
used in real organizations had diverse behaviours. 
OGC is in business of publishing schemas for widespread use by many
organizations, used to build loosely coupled systems where it isn't feasible
to enforce any particular one of the caching and processing models allowed
by the spec, so we had to assume the worse case. 
That requirement inexorably led to the conclusion that new namespaces were
required every time. 

This won't apply in every use case, but if you are publishing schemas that
you expect lots of people to use, and you have limited control over them it
is a scenario that must be considered. 

However, there is a nuance to this: where the new schema only adds stuff to
an existing schema, you do this in a new namespace, but <import> the
existing schema, so existing components do not change namespaces, its just
that the ones added after the original publication have a new namespace.
(Effectively this is the strategy used by Google when upgrading KML.)

--------------------------------------------------------
Simon Cox

European Commission, Joint Research Centre, Institute for Environment and
Sustainability, Spatial Data Infrastructures Unit, TP 262 Via E. Fermi,
2749, I-21027 Ispra (VA), Italy
Tel: +39 0332 78 3652
Fax: +39 0332 78 6325
mailto:simon.cox@jrc.ec.europa.eu
http://ies.jrc.ec.europa.eu/simon-cox 

SDI Unit: http://sdi.jrc.ec.europa.eu/
IES Institute: http://ies.jrc.ec.europa.eu/
JRC: http://www.jrc.ec.europa.eu/
--------------------------------------------------------

-----Original Message-----
From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com]
Sent: Wednesday, 2 September 2009 19:56
To: Simon Cox
Cc: 'Andrew Welch'; 'ekimber'; 'G. Ken Holman'; 'Henry S. Thompson'; 'Tsao,
Scott'
Subject: RE: Best Practices for Establishing Namespace Name

Simon Cox writes:

> A processors will maintain a cache of schema component definitions and 
> declarations and associate it with a namespace.

Not in general.  That's neither required nor encouraged by the
Recommendation, though it is allowed, and some implementations do.


> The processing rules in the XML Schema spec do not require that a 
> processor load the schema fresh if a new document comes in with the 
> same namespace.

That's true, but neither do they forbid reloading.  Quoting from the Rec: 
[1]:

"Processors have the option to assemble (and perhaps to optimize or
pre-compile) the entire schema prior to the start of an .assessment.
episode, or to gather the schema lazily as individual components are
required."

I think your reasoning is somewhat backwards:  I believe the intention is
that processors may implement a variety of startegies, and that >users
should choose processors (or processor switches) that are appropriate for
the particular purpose<. 

So, rather than saying:

"be sure to use a new namespace when the processing rules for your markup
change between versions, because your processor will surely cache the old
content models".

I would say:

"There are many tradeoffs in deciding whether to use the same markup from
the same namespaces when content models and/or the interpretation of content
changes from version to version.  One complexity to consider is that some
schema processors maintain caches of pre-assembled schemas, and those
processors may not behave well if the same markup is to be interpreted
differently according to the version of the language.  Other processors do
provide either for just-in-time assembly of schemas, or for the necessary
level of control over schema document caching."

BTW:  there are really downsides to using a new namespace when minor changes
are made to a language.  There can be many other artifacts that will need
revision that would otherwise be unnecessary, e.g. XPaths in stylesheets.
Furthermore,  it can happen that a new version is created to revise just one
small feature of a language, and then the question arises whether to
republish the whole language in a new namespace, or only the new features.
If the former is done, then even documents otherwise unaffected by the
language revision may wind up having two expressions, one using the old and
one using the new namespace;  conversely, if only changed markup is in the
new namespace, then users have to remember which feature was revised when,
and deal with many namespace prefixes when languages are revised many times.

Overall, my observation has been that it's usually easier to use namespaces
for more functional decomposition (e.g. one namespace for personnel-related
vocabulary, one for inventory, etc.) and to use namespace changes sparingly
when implementing revisions to a language specification.  So, mostly,
successive versions of a language should use the same namespaces, except
maybe for qualitatively different features.

Noah



[1] http://www.w3.org/TR/xmlschema-1/#layer1

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








"Simon Cox" <simon.cox@jrc.ec.europa.eu>
09/02/2009 01:34 PM
 
        To:     "'Andrew Welch'" <andrew.j.welch@gmail.com>
        cc:     <noah_mendelsohn@us.ibm.com>, "'Tsao, Scott'" 
<scott.tsao@boeing.com>, "'G. Ken Holman'" 
<gkholman@cranesoftwrights.com>, "'Henry S. Thompson'" <ht@inf.ed.ac.uk>, 
<xmlschema-dev@w3.org>, "'ekimber'" <ekimber@reallysi.com>
        Subject:        RE: Best Practices for Establishing Namespace Name


No - its not merely the identity issue. 
It's the processing issue. 
A processors will maintain a cache of schema component definitions and
declarations and associate it with a namespace. 
The processing rules in the XML Schema spec do not require that a 
processor
load the schema fresh if a new document comes in with the same namespace. 
So if the new document is actually using a different schema (even if the
namespace is the same) then processing will fail. 
The only way to ensure safe processing (i.e. that respects *all* of the
processing straegies allowed for in the XML Schema spec) is to be 
scrupulous
about changing namespace if the schema changes. 
In many cases that is most easily handled by including a version 
identifier
in the namespace. 

Because of all this, the XML Schema processing rules effectively imply 
that
the target namespace is the schema identifier. 

--------------------------------------------------------
Simon Cox

European Commission, Joint Research Centre, 
Institute for Environment and Sustainability, 
Spatial Data Infrastructures Unit, TP 262 
Via E. Fermi, 2749, I-21027 Ispra (VA), Italy 
Tel: +39 0332 78 3652
Fax: +39 0332 78 6325
mailto:simon.cox@jrc.ec.europa.eu 
http://ies.jrc.ec.europa.eu/simon-cox 

SDI Unit: http://sdi.jrc.ec.europa.eu/ 
IES Institute: http://ies.jrc.ec.europa.eu/
JRC: http://www.jrc.ec.europa.eu/
--------------------------------------------------------

-----Original Message-----
From: Andrew Welch [mailto:andrew.j.welch@gmail.com] 
Sent: Wednesday, 2 September 2009 19:18
To: Simon Cox
Cc: noah_mendelsohn@us.ibm.com; Tsao, Scott; G. Ken Holman; Henry S.
Thompson; xmlschema-dev@w3.org; ekimber
Subject: Re: Best Practices for Establishing Namespace Name

2009/9/2 Simon Cox <simon.cox@jrc.ec.europa.eu>:
>  Andrew Welch wrote
>> use a version attribute to distinguish the versions
>
> Where?

Typically on the root element, but it could go anywhere that's suitable.

> The issue was that elements with the same name were defined 
> differently in both GML 2.0 and GML 3.0, But they had the same target 
> namespace. The differences were subtle - technical rather than 
> conceptual - but real as far as a validating processor is concerned. 
> The XML namespace is to all practical intents and purposes the 
> designated identifier for 'the schema' and we had the same identifier 
> for different things. Chaos ensues.

Between versions the content model of elements will change, but that 
doesn't
mean you need a different namespace... Incompatible changes are actually
easier than supporting backwards compatibility, instead of detecting the
version and using the right xsd and corresponding parsing code, you simply
reject anything that fails validation for that version.

Anyway, it's interesting that you say the namespace is (to all intents 
ands
purposes) the identifier for the schema, perhaps that's where the problem
is... the namespace value itself has started to mean something, when its
meant to mean nothing.

Everyone seems to have different opinions on this, and I think Ive asked 
in
the past if anyone has a best practices guide which didnt attract too many
confident replies, but at the moment for me its simply "namespace that 
won't
ever change, version attribute" : )




--
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/
Received on Wednesday, 2 September 2009 18:09:40 UTC