RE: Algorithm for computing the cardinality of a simpleType?

Hi Xan,

I like your algorithm!  I wonder if checking for enumerations shouldn't be
done first, and then patterns, and then the 7 steps you described?

Algorithm for Computing the Cardinality of a byte simpleType (with Xan's
suggestions)

1. If there are enumeration facets then
      cardinality = count the number of enumeration facets
   Done.

2. If there are pattern facets then
      cardinality = the maximum cardinality of all the pattern facets.
   Done.

3. Otherwise:
      3.1. lo = -128, hi = 127
      3.2. minInclusive present => lo = max(lo, minInclusive)
      3.3. maxInclusive present => hi = min(hi, maxInclusive)
      3.4. minExclusive present => lo = max(lo, minExclusive + 1)
      3.5. maxExclusive present => hi = min(hi, maxExclusive - 1)
      3.6. totalDigits present => lo = max(lo, -10^^totalDigits + 1); 
                                  hi = min(hi, 10^^totalDigits - 1)
      3.7. cardinality = hi - lo + 1

Note that I am assuming that the enumeration facet takes precedence over all
other facets.  I believe that this is true, isn't it?

Does this algorithm seem correct?   /Roger


-----Original Message-----
From: Xan Gregg [mailto:xan.gregg@jmp.com] 
Sent: Thursday, July 21, 2005 10:12 AM
To: Roger L. Costello
Cc: xmlschema-dev@w3.org
Subject: Re: Algorithm for computing the cardinality of a simpleType?

Roger,

Computing cardinality of a pattern would be enough to scare me off. 
What is the cardinality of 
0*1*0*2*0*3*0*4*0*5*0*6*0*7*0*8*0*9*8*7*6*5*? If it weren't for leading 
zeros you could just try every possible lexical value and see if it's 
valid.

totalDigits is based on absolute values so it also applies to negative 
numbers. As a result totalDigits of 2 has cardinality 199, not 100.

I think you need to worry more about combinations of facets, such as 
min=75 and totalDigits=2. In the example you give,

> <simpleType name="foo">
>     <restriction base="byte">
>         <minInclusive value="0"/>
>         <maxInclusive value="3"/>
>         <enumeration value="1"/>
>         <enumeration value="2"/>
>     </restriction>
> </simpleType>
>
> My algorithm checks for the combination of minInclusive and 
> maxInclusive
> before it checks for enumerations.  For this example my algorithm 
> returns a
> cardinality of: 4 (which is correct).

Wouldn't 2 be the correct answer because of the enumeration facet?

Ignoring pattern (hard) and enumeration (easy), my approach would be to 
keep a running range while processing factes:

1. lo = -128, hi = 127
2. minInclusive present => lo = max(lo, minInclusive)
3. maxInclusive present => hi = min(hi, maxInclusive)
4. minExclusive present => lo = max(lo, minExclusive + 1)
5. maxExclusive present => hi = min(hi, maxExclusive - 1)
6. totalDigits present => lo = max(lo, -10^^totalDigits + 1); hi = 
min(hi, 10^^totalDigits - 1)
7. range cardinality = hi - lo + 1

Then apply pattern and enumeration info to compute actual cardinality.

xan

Received on Thursday, 21 July 2005 16:06:21 UTC