Bug in the Schemas for schemas

   Apparently the regular expressions given for matching XPath
strings are incorrect:

XMLSchema.xsd around line 1061 (there is a similar problem for
the "selector" definition).

-------------------
      <xs:attribute name="xpath" use="required">
       <xs:simpleType>
        <xs:annotation>
         <xs:documentation>A subset of XPath expressions for use
in selectors</xs:documentation>
         <xs:documentation>A utility type, not for public
use</xs:documentation>
        </xs:annotation>
        <xs:restriction base="xs:token">
         <xs:annotation>
          <xs:documentation>The following pattern is intended to allow XPath
                            expressions per the following EBNF:
           Selector    ::=    Path ( '|' Path )*
           Path    ::=    ('.//')? Step ( '/' Step )*
           Step    ::=    '.' | NameTest
           NameTest    ::=    QName | '*' | NCName ':' '*'
                            child:: is also allowed
          </xs:documentation>
         </xs:annotation>
         <xs:pattern 
value="(\.//)?(((child::)?((\i\c*:)?(\i\c*|\*)))|\.)(/(((child::)?((\i\c*:)?(\i\c*|\*)))|\.))*(\|(\.//)?(((child::)?((\i\c*:)?(\i\c*|\*)))|\.)(/(((child::)?((\i\c*:)?(\i\c*|\*)))|\.))*)*">
         </xs:pattern>
        </xs:restriction>
       </xs:simpleType>
      </xs:attribute>
-------------------

   Accordingly to the documentation, and also XPath path definition, one
would not expect "a:1" to be matched isn't it ?

   Though I was surprized by the result of one of my tests:

paphio:~/regexp -> ./testRegexp 
'(\.//)?(((child::)?((\i\c*:)?(\i\c*|\*)))|\.)(/(((child::)?((\i\c*:)?(\i\c*|\*)))|\.))*(\|(\.//)?(((child::)?((\i\c*:)?(\i\c*|\*)))|\.)(/(((child::)?((\i\c*:)?(\i\c*|\*)))|\.))*)*' 
'a:1'
a:1: Ok
paphio:~/regexp ->

   which can be reduced to:

paphio:~/regexp -> ./testRegexp '(\i\c*)' 'a:1'
a:1: Ok
paphio:~/regexp ->

   Ah ... Back to the definition:

http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#nt-MultiCharEsc
   \c  the set of name characters, those matched by NameChar

   Which point to:

http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-NameChar
   [4]    NameChar   ::=    Letter | Digit  | '.' | '-' | '_' | ':' |
                            CombiningChar | Extender

   and ':' is part of NameChar as defined by the XML specification.

XPath when defining the NameTest actually referenced the Namespace in XML REC
to avoid this problem:

   http://www.w3.org/TR/xpath#NT-NameTest

   http://www.w3.org/TR/REC-xml-names/#NT-NCName
   [4] NCName  ::= (Letter | '_') (NCNameChar)* /* An XML Name, minus the 
":" */
   [5] NCNameChar  ::= Letter | Digit | '.' | '-' | '_' | CombiningChar | 
Extender

   Conclusion:
    - The Schemas for Schemas does not define a subset of XPath, damn...

   Suggestions:
    - Change the definition to base it either on the XPath specification or
      Namespace in XML specs references
    - Change the regular expression occurences of "\i\c*"
    - For the record I concur with others on the opinion that
      XML Schema Part 1: Structures is nearly impossible to understand
      and should be rewritten with more prose an without sentences longer than
      25 words.

   Question:
    - Did anyone implemented Schemas by actually reading the specification
      and transcribing it litterally using a programming language ?

  thanks,

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ 

Received on Wednesday, 3 April 2002 14:07:05 UTC