Re: Specifying Datatype Atoms in Regular Expressions

> Re: Specifying Datatype Atoms in Regular Expressions
> 
> First of all, since your "Percentage" type is based on "string" type,
> rather than "float" type, you can't apply minInclusive/maxInclusive
> facets to it. I understand what you want to do, but you can't expect the
> validating processors to understand it.
> 
> So probably your example should be
> 
> <simpleType name="float12-45">
>   <restriction base="float">
>     <minInclusive value="12" />
>     <maxInclusive value="45" />
>   </restriction>
> </simpleType>
> 
> <simpleType>
>   <restriction base="string">
>     <pattern value="\x{float12-45}%" />
>   </restriction>
> </simpleType>
> 

It makes more sense to have the restriction base for anything that would
use a datatype as a regular expression atom be "xsd:string" since it is
the most generic datatype.  All attributes (and elements) are nothing
more than strings until something (e.g. xsd:float) comes along and
applies additional meaning to the string.  I stand by the suggested
definition and use as follows:

  <!-- declare the datatype using the proposed /x{} syntax -->
  <xsd:simpleType name="Percentage">
     <xsd:restriction base="xsd:string">
         <xsd:pattern value="\x{xsd:float}%" />
     </xsd:restriction>
  </xsd:simpleType>


To handle inclusivity and exclusivity restrictions, these restriction
values must match pattern restrictions for the base value as in the
example I gave:

  <xsd:simpleType>
    <xsd:restriction base="Percentage">
       <xsd:minInclusive value="12%" />
       <xsd:maxInclusive value="45%" />
     </xsd:restriction>
  </xsd:simpleType>

This way the validator can compare the floating point values in the
restriction with the floating point values in a given attribute in an
actual XML document.


> Even so, you can't expect the validating processors to validate things
> like (\x{float12-45})+

Absolutely correct, but then again I don't expect them to correctly
validate any other attempt to list multiple character groupings without
whitespace (or another character not in a base datatype pattern).  Since
a '%' is not allowed in a float type, you could parse "12%2.3%44%" with
"\x{Percentage}+".  However, it may be better for readability to use
"(\x{Percentage} )+" and rewrite the previous string as "12% 2.3% 44%
".  Certainly there are many cases where one can create impossible to
validate patterns using any of the quantifier characters ('+', '*',
'{}', etc.), but the point of this exercise is not to fool the
validator, but to work within its boundaries.


 
> > I would also expect that any parser worthy of handling regular
> > expressions as they are currently defined should be able to extend
> > itself to handling this new syntax with a minimum of effort.
> 
> This is definitely no. Because it is very difficult to create regular
> expression of facet-restricted datatype. I would rather say it's
> impossible.
> 

I cannot dispute this very well since I hate writing parsing code and
have no real skill at it.  On the other hand, I can envision the parsing
of datatype atoms as identifying those characters acting as delimiters
in a string and subsequently applying the datatype rules to the
resulting substrings.  This should not be much more difficult than what
I expect is occuring when the validator parses a string using a pattern
such as this "\d*(\.|\,)\d+%".

I admit there may be flaws to the concept, but I don't believe any of
them to be insurmountable yet especially since so little time has been
applied to the concept.  Right now, the most insurmountable problem with
datatypes as RE atoms is that the XML-Schema spec is already in the
Proposed Recommendation category.  On a more positive note though,
there's always going to be the next version of XML-Schema in the future
which may include this.

-- 
Steve Rosenberry
Sr. Partner

Electronic Solutions Company -- For the Home of Integration
http://ElectronicSolutionsCo.com

http://BetterGoBids.com -- The Premier GoTo Bid Management Tool

(610) 670-1710

Received on Tuesday, 3 April 2001 16:38:03 UTC