W3C home > Mailing lists > Public > xmlschema-dev@w3.org > October 2012

Re: union is not a union, it's a sequence

From: Michael Kay <mike@saxonica.com>
Date: Thu, 18 Oct 2012 14:15:19 +0100
Message-ID: <508000E7.6060808@saxonica.com>
To: xmlschema-dev@w3.org
Well, you have defined two value spaces: strings consisting entirely of 
printable characters, and strings consisting entirely of whitespace 
characters. The union of those two value spaces is (strings that consist 
entirely of printable characters or entirely of whitespace characters). 
A string that mixes printable characters and whitespace characters is 
not in either value space, therefore it should not be in their union.

If you want to compose pattern-based types from reusable components you 
could do this using entities to build up regular expressions. This is 
the way that Michael Sperberg-McQueen defined the types that match 
different flavours of URI in

http://www.w3.org/2011/04/XMLSchema/TypeLibrary-URI-RFC3986.xsd

and

http://www.w3.org/2011/04/XMLSchema/TypeLibrary-IRI-RFC3987.xsd

To see the way these complex regular expressions are constructed, view 
these documents at the raw XML level using (for example) curl.

Michael Kay
Saxonica



On 18/10/2012 13:44, Costello, Roger L. wrote:
> Hi Folks,
>
> Proposition:
>        A union of member types does not produce
>        a union, it produces a sequence of member
>        types.
>
> Proof:
>      <xs:simpleType name="white-space-characters">
>          <xs:annotation>
>              <xs:documentation>
>                  The space (SP, ASCII value 32) and horizontal tab (HTAB,
>                  ASCII value 9) characters are known as the white space
>                  characters, WSP.
>              </xs:documentation>
>          </xs:annotation>
>          <xs:restriction base="xs:string">
>              <xs:pattern value="[&#9;&#32;]*" />
>          </xs:restriction>
>      </xs:simpleType>
>      
>      <xs:simpleType name="printable-characters">
>          <xs:annotation>
>              <xs:documentation>
>                  The printable US-ASCII characters are the characters that
>                  have values between 33 and 126, inclusive.
>              </xs:documentation>
>          </xs:annotation>
>          <xs:restriction base="xs:string">
>              <xs:pattern value="[&#33;-&#126;]*" />
>          </xs:restriction>
>      </xs:simpleType>
>      
>      <xs:simpleType name="header-field-body-characters">
>          <xs:annotation>
>              <xs:documentation>
>                  A field body may be composed of printable US-ASCII characters
>                  as well as the WSP.
>              </xs:documentation>
>          </xs:annotation>
>          <xs:union memberTypes="printable-characters white-space-characters" />
>      </xs:simpleType>
>
>      <xs:element name="header-field-body" type="header-field-body-characters" />
>
> *Valid* instance document:
>
>      <header-field-body>HelloWorld</header-field-body>
>    
> *Valid* instance document:
>
>      <header-field-body>  </header-field-body>
>
> *Invalid* instance document:
>
>      <header-field-body>Hello  World</header-field-body>
>
> Therefore the union of printable-characters and white-space-characters does not yield a union of their value spaces; rather, it merely provides a sequence of two types.
>
> Ugh.
>
> So, how do I truly union printable-characters and white-space-characters?
>
> Of course, I could simply copy the regex pattern from printable-characters and white-space-characters and paste:
>
>      <xs:simpleType name="header-field-body-characters">
>          <xs:annotation>
>              <xs:documentation>
>                  A field body may be composed of printable US-ASCII characters
>                  as well as the WSP.
>              </xs:documentation>
>          </xs:annotation>
>          <xs:restriction base="xs:string">
>              <xs:pattern value="[&#9;&#32;&#33;-&#126;]*" />
>          </xs:restriction>
>      </xs:simpleType>
>
> But that is awful as it is totally disconnected from printable-characters and white-space-characters.
>
> Any suggestions?
>
> /Roger
>
>
Received on Thursday, 18 October 2012 13:15:45 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:16:02 UTC