- From: Steve Rosenberry <steve.rosenberry@verizon.net>
- Date: Tue, 03 Apr 2001 16:37:25 -0400
- To: www-xml-schema-comments@w3.org
> Re: Specifying Datatype Atoms in Regular Expressions > > First of all, since your "Percentage" type is based on "string" type, > rather than "float" type, you can't apply minInclusive/maxInclusive > facets to it. I understand what you want to do, but you can't expect the > validating processors to understand it. > > So probably your example should be > > <simpleType name="float12-45"> > <restriction base="float"> > <minInclusive value="12" /> > <maxInclusive value="45" /> > </restriction> > </simpleType> > > <simpleType> > <restriction base="string"> > <pattern value="\x{float12-45}%" /> > </restriction> > </simpleType> > It makes more sense to have the restriction base for anything that would use a datatype as a regular expression atom be "xsd:string" since it is the most generic datatype. All attributes (and elements) are nothing more than strings until something (e.g. xsd:float) comes along and applies additional meaning to the string. I stand by the suggested definition and use as follows: <!-- declare the datatype using the proposed /x{} syntax --> <xsd:simpleType name="Percentage"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\x{xsd:float}%" /> </xsd:restriction> </xsd:simpleType> To handle inclusivity and exclusivity restrictions, these restriction values must match pattern restrictions for the base value as in the example I gave: <xsd:simpleType> <xsd:restriction base="Percentage"> <xsd:minInclusive value="12%" /> <xsd:maxInclusive value="45%" /> </xsd:restriction> </xsd:simpleType> This way the validator can compare the floating point values in the restriction with the floating point values in a given attribute in an actual XML document. > Even so, you can't expect the validating processors to validate things > like (\x{float12-45})+ Absolutely correct, but then again I don't expect them to correctly validate any other attempt to list multiple character groupings without whitespace (or another character not in a base datatype pattern). Since a '%' is not allowed in a float type, you could parse "12%2.3%44%" with "\x{Percentage}+". However, it may be better for readability to use "(\x{Percentage} )+" and rewrite the previous string as "12% 2.3% 44% ". Certainly there are many cases where one can create impossible to validate patterns using any of the quantifier characters ('+', '*', '{}', etc.), but the point of this exercise is not to fool the validator, but to work within its boundaries. > > I would also expect that any parser worthy of handling regular > > expressions as they are currently defined should be able to extend > > itself to handling this new syntax with a minimum of effort. > > This is definitely no. Because it is very difficult to create regular > expression of facet-restricted datatype. I would rather say it's > impossible. > I cannot dispute this very well since I hate writing parsing code and have no real skill at it. On the other hand, I can envision the parsing of datatype atoms as identifying those characters acting as delimiters in a string and subsequently applying the datatype rules to the resulting substrings. This should not be much more difficult than what I expect is occuring when the validator parses a string using a pattern such as this "\d*(\.|\,)\d+%". I admit there may be flaws to the concept, but I don't believe any of them to be insurmountable yet especially since so little time has been applied to the concept. Right now, the most insurmountable problem with datatypes as RE atoms is that the XML-Schema spec is already in the Proposed Recommendation category. On a more positive note though, there's always going to be the next version of XML-Schema in the future which may include this. -- Steve Rosenberry Sr. Partner Electronic Solutions Company -- For the Home of Integration http://ElectronicSolutionsCo.com http://BetterGoBids.com -- The Premier GoTo Bid Management Tool (610) 670-1710
Received on Tuesday, 3 April 2001 16:38:03 UTC