Re: Whitespace normalization for union types from Kasimier Buchcik on 2005-06-01 (xmlschema-dev@w3.org from June 2005)

From: Kasimier Buchcik <kbuchcik@4commerce.de>
Date: Thu, 02 Jun 2005 00:00:50 +0200
To: Xan Gregg <xan.gregg@jmp.com>
Cc: XML-SCHEMA <xmlschema-dev@w3.org>
Message-Id: <1117663250.9348.50.camel@librax>

On Wed, 2005-06-01 at 14:47 -0400, Xan Gregg wrote:
> > The fact that the whitespace-value is at hand when the value was
> > already validated against the member-types, seems to contradict with
> > [2] Datatype Valid, which mandates the pattern facet to be applied
> > first; but without the whitespace-value, normalization is not possible,
> > so applying the pattern facet is not possible as well.
> > Can someone clarify this?
> 
> That rules says that patterns are applied to the lexical values, but  
> lexical values only exist *after* white space normalization. 3.1.4 of  
> Structures [1] discusses how the "initial value" is turned into a  
> "normalized value" using white space processing. (The "initial value"  
> space is also called the "pre-lexical" space.) 2.2.1.2 says is it the  
> normalized value that is fed into the simple type validation process.

Hmm, I repeat [1]:
"For all datatypes ·derived· by ·union·  whiteSpace does not apply
directly; however, the normalization behavior of ·union· types is
controlled by the value of whiteSpace on that one of the ·memberTypes·
against which the ·union· is successfully validated."

The value would validate against the xs:string type, since xs:string
appears first in the member-types, so I would expect the whitespace of
the union type to reflect this by using the whitespace 'preserve' of
xs:string.

> 
> So your example appears valid.
> 
> initial value = ' a  '
> normalized value (string) = ' a  ' => not valid
> normalized value (token) = 'a' => valid

It seems awkward if the whitespace-value that fits would be chosen,
and not the actual member type definition (xs:string).

The validation process as currently in my head:

1. validate against xs:string
   --> valid
2. take xs:string's whitespace
3. normalize with xs:string's whitespace
4. apply the pattern facet
   --> not valid

Trying to find a reason for the behaviour you describe, I can think
of the following process:

1. validate against xs:string
   --> valid
   1.1 magically apply the union's facet pattern
     --> invalid
2. validate against xs:token
   --> valid
   2.1 magically apply the union's pattern facet
   --> valid
3. don't apply union's pattern facet since applied
   magicall during validation of the member-types
   --> valid

If the latter process is the expected one, then the wording of the
spec does not lead to this conclusion. It says that the whitespace
of the member-types which validates successfully is taken, and not
that that the member-type which has a fitting whitespace to
apply union's facets should be chosen. I can find no evidence of
a behaviour in the spec that chooses the _actual_ member type by
taking the union's facets into account.

I really need some definitive clarification on this. 

[1] http://www.w3.org/TR/xmlschema-2/#rf-whiteSpace

Regards & thanks,

Kasimier

Received on Wednesday, 1 June 2005 22:01:01 UTC