Re: [xml-dev] which xml schema tools do it right concerning including attributes xml:lang and xml:space

Hi Dare,

> So basically there is a loophole in the spec where although the
> correct type information for xml:base, xml:lang and xml:space must
> be used in a schema, that the same does not apply for an instance
> document. Currently the spec does not explicitly prevent me from
> specifying that xml:lang must be an integer between 5 and 10 or that
> xml:base must only be a date in an instance document.

Of course when it actually comes to validating an instance, parsers
that recognise xml:lang, xml:space and xml:base attributes (which
should be all parsers) should check the values in the instance to see
whether they're valid as part of their normal XML parsing job. The XML
Schema Rec doesn't prevent you from specifying that xml:lang must be
an integer between 5 and 10, but if you use an xml:lang attribute in
your instance and it value is an integer between 5 and 10 then an XML
parser should complain about that (basic XML 1.0 well-formedness
parsing should give a warning if xml:lang isn't a language code, I
think).

> However we do not think that allowing such a loophole to exist and
> thereby allowing people to redefine the meanings of xml:base,
> xml:space and xml:lang in their schema is desirable. So our
> implementation assumes that the XML namespace is imported in every
> schema it validates and ignores any re-importation of the XML
> namespace.

It's interesting looking at the XML Rec on this point. In the XML Rec
it specifically says that xml:* attributes must be declared like any
other. It doesn't say anything about what the declaration for xml:lang
must look like (though it suggests a NMTOKEN type), but it does say
that xml:space must be declared as an enumerated type whose values are
one or both of 'preserve' and 'default'. There's also nothing in the
XML Base Rec that I can see on a quick scan limiting the way in which
xml:base is declared in a DTD.

As you say, this gives the situation where something you do in the
schema renders all instances invalid (either at an XML 1.0 level or
against the schema [or DTD for that matter]), but that thing can't be
detected during schema construction.

A possible solution to this (for XML Schema 1.1, I guess) would be to
have XML Schema list some limitations to the types of attributes
declared in the XML namespace. Something along the lines of:

  1. If the target namespace is 'http://www.w3.org/XML/1998/namespace'
     then the local name of the attribute must be 'lang', 'space' or
     'base'.

  2. The appropriate one of the following:

     2.1. If the local name is 'lang' then the type of the attribute
          must be xs:language or validly derived from xs:language.

     2.2. If the local name is 'space' then the type of the attribute
          must be a simple type whose whiteSpace facet is collapse
          and whose enumerated values are one or both of 'preserve'
          and 'default'.

     2.3. If the local name is 'base' then the type of the attribute
          must be xs:anyURI or validly derived from xs:anyURI.

This would allow people to redefine the values for these attributes,
if they need to constrain them further (for example to only allow
xml:lang to be 'ru' or 'de' on a particular element), without allowing
them to declare them in ways such that the instance becomes invalid.

In terms of the XML Schema 1.0 validators, I'd personally find it more
helpful to be warned about declarations for these attributes that
permit invalid values for these attributes rather than having those
declarations ignored entirely. But there are plenty more important
things to worry about.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/

Received on Friday, 19 April 2002 04:55:55 UTC