W3C home > Mailing lists > Public > xmlschema-dev@w3.org > April 2011

Re: Algorithm for merging the pattern facets in a base simpleType with a subtype? (UNCLASSIFIED)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Tue, 19 Apr 2011 17:41:25 -0600
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "xmlschema-dev@w3.org" <xmlschema-dev@w3.org>
Message-Id: <4DACB052-5B48-430A-BEF9-12A9FE085E75@blackmesatech.com>
To: "Cheney, Edward A SSG RES USAR USARC" <austin.cheney@us.army.mil>

On Apr 19, 2011, at 1:04 PM, Cheney, Edward A SSG RES USAR USARC wrote:

> Classification: UNCLASSIFIED
>>> Therefore, the patterns that apply to "B" are just the patterns
>>> contained in "B". Effectively the patterns in "A" may be ignored. Do
>>> you agree?
>> 
>> No, sorry, there is nothing in the spec to justify that conclusion.
> 
> This logic is only valid in the narrow situation where "B" contains a
> pattern that directly and identically conflicts with a pattern from "A".
> In that case inheritance is blocked by instantiation.  Otherwise, if
> there is a conflict of patterns and if that conflict is not absolute the
> result is typically a union that applies the differences from
> inheritance as a remainder for attachment to the portion of the pattern
> conflicted in "B".

I'm not sure I follow, sorry.

> This sounds simple, except that I did not define the terms "difference"
> or "conflict".  I am not sure of a situation where the computed
> definition of a single pattern instance can become so complex that the
> conflict of it versus a pattern result from inheritance could become
> ambiguous.  

I'm not sure what you mean by 'ambiguous' here.  

Perhaps you mean that things can become so complicated that
it's not clear whether a given string is supposed to be legal or
not legal according to a given type definition.

The meaning of patterns is fairly simple, whether they are
inherited from the base type or specified on a restriction:  the
literal being tested is in the lexical space of type T only if
it matches each pattern in T.{facets}.  Or, to put it another way,
the literal is in the lexical space of T only if it is in the intersection
of the languages recognized by the patterns of T.

It is an important and fundamental result of formal language
theory that the intersection of two regular languages is a 
regular language.  And since any regular language can be defined
by a regular expression it follows that for any T it is possible in
theory to calculate a single regular expression for the 
intersection of all the patterns of T.  But there is no requirement
that XSD processors actually calculate that regex, and the cost
of calculating is typically high enough that for most implementations
it will be faster, as well as simpler to get right, just to match the
literal against each pattern, in turn.

If there are several hundred regular-expression patterns attached
to a simple type definition, that may become slow.  But it 
doesn't lead to any uncertainty in the result.

Or perhaps you mean 'ambiguous' in the sense most common in
discussions of grammars and languages:  a pattern can become
so complex that for some strings that match the pattern there is
more than one parse tree.  This is true, though it's not really a
function of complexity:  the regex a?a? is ambiguous in this
sense, since the string 'a' has two ways to match the regex.  
But there is no requirement in XSD that regexes be unambiguous
in that formal sense.


> If such a narrow condition is permissible I would not know
> the correct answer.  I suspect the occurrence, if any, of such
> complexity would occur more often in the wild as a result of extending
> the pattern facet to allow multiple regular expressions in a given
> facet.  If this is even a valid case it begs the question of what are
> the results in such situations when the hierarchical inheritance is vast
> in depth allowing for this condition to be inherited onto a separate
> instance of conflict.

It sounds like you have an interesting question, but I do not
understand it, so I should probably avoid trying to answer
it.

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************
Received on Tuesday, 19 April 2011 23:41:50 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 19 April 2011 23:41:51 GMT