- From: Kent Johnson <kentj@rsn.hp.com>
- Date: Fri, 28 Sep 2001 19:12:29 -0500 (CDT)
- To: www-xml-schema-comments@w3.org
- Cc: k-kawa@bigfoot.com
i searched google to find if anyone had brought this up before and found this post... the webpage that linked to the post that started this discussion was found at: http://lists.w3.org/Archives/Public/www-xml-schema-comments/2001JanMar/0425.html i would like to know if anyone is considering this... i REALLY think it should be added to xml schema.. my comments are mixed in below: > Date: Thu, 29 Mar 2001 13:18:09 -0800 > From: Kohsuke KAWAGUCHI <k-kawa@bigfoot.com> > To: www-xml-schema-comments@w3.org > Message-Id: <20010329130101.6747.K-KAWA@bigfoot.com> > Subject: Re: Specifying Datatype Atoms in Regular Expressions > > > I'm not a WG member, so the following is just my personal opinion. nor am i > Your proposal might be useful, but it has several flaws. > > > > <!-- declare the datatype using the proposed /x{} syntax --> > > <xsd:simpleType name="Percentage"> > > <xsd:restriction base="xsd:string"> > > <xsd:pattern value="\x{xsd:float}%" /> > > </xsd:restriction> > > </xsd:simpleType> > > > > <!-- declare an element schema using the Percentage datatype --> > > <xsd:element name="AVCommand"> > > <xsd:complexType> > > <xsd:attribute name="volume"> > > <xsd:simpleType> > > <xsd:restriction base="Percentage"> > > <xsd:minInclusive value="12%" /> > > <xsd:maxInclusive value="45%" /> > > </xsd:restriction> > > </xsd:simpleType> > > </xsd:attribute> > > </xsd:complexType> > > </xsd:element> > > First of all, since your "Percentage" type is based on "string" type, > rather than "float" type, you can't apply minInclusive/maxInclusive > facets to it. I understand what you want to do, but you can't expect the > validating processors to understand it. > > So probably your example should be > > <simpleType name="float12-45"> > <restriction base="float"> > <minInclusive value="12" /> > <maxInclusive value="45" /> > </restriction> > </simpleType> > > <simpleType> > <restriction base="string"> > <pattern value="\x{float12-45}%" /> > </restriction> > </simpleType> the above is a perfect example. > Even so, you can't expect the validating processors to validate things > > like (\x{float12-45})+ yes you can. regular expression parsers have to deal with a similar prooblem all the time. if it had to match a+ and it was given "aaa" where would it match? well, the xml schema recommendation says its regexps are based on the Perl regexps (with a slight tweak). the Programming Perl book (the camel book) published by O'Reilly states Rule 1 of regular expression matching in perl as "The Engine tries to match as far left in the string as it can..." (my page 60). so in the "aaa" case it matches on the first "a", and doesn't care what is left. however, we aren't trying to match merely part of a line line in perl, we need to match the whole thing.. so the a+ would be like ^a+$ ... so the regexp engine would see that "aaa" matches a+ and continue now if we wanted to match 2 float12-45's in a row like \x{float12-45}\x{float12-45} and we were given the string to match as "1190", in perl we would get a match on "19" even though "11" and "90" aren't float 12-45's. but since we need to match from the beginning, the regexp engine would try to match "1", then "11", then "119", then "1190" and then fail, since it hit the end. standard business. but what if we had a float0-99 that was any integer 0 through 99, and we wanted to match two in a row like \x{float0-99}\x{float0-99} ... on the string "1190" we would match the first float as "1" and the second as the second "1" and then fail, since we had "90" left over... this is the fault of the designer. you can't string two numbers together without any punctuation and expect to be able to tell what goes where.. that's just the way things go, even outside of the computer realm. notice when we write dates we say 4-23-1981 or 4/23/1981 not 4231981.. and we have time as 5:17:16 not 51716... so if you have punctuation in between, this can be very useful (see example stated later below). > > I would also expect that any parser worthy of handling regular > > expressions as they are currently defined should be able to extend > > itself to handling this new syntax with a minimum of effort. > > This is definitely no. Because it is very difficult to create regular > expression of facet-restricted datatype. I would rather say it's > impossible. it's definitely yes. it's just as difficult as all the rest of regular expression matching. > I'm sorry to say that, but your proposal is unable to implement. I think it is quite easily implemented, and in fact I think it should be. let me give an example of where it would be especially useful. say we wanted an attribute in an xml file to be the ip address of a system. nowadays what do we use, xsd:string? that's really not gunna cut it. this leaves validation to the person that does the parsing.. or worse, no validation is implemented, and errors start flying after someone forgets and puts 256 instead of 255.. so how about regular expressions? here is an xml schema simpleType for an ip address using regexps... <xsd:simpleType name="IPAddress"> <xsd:restriction base="xsd:string"> <xsd:pattern value="([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])"/> </xsd:restriction> </xsd:simpleType> i've never seen something so unmaintainable :) now lets see an example with the proposed \x{data_type} addition to the regexps... <xsd:simpleType name="IPAddress"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\x{xsd:byte}\.\x{xsd:byte}\.\x{xsd:byte}\.\x{xsd:byte}"/> </xsd:restriction> </xsd:simpleType> ..ahhh, that's better... and since we don't have numbers all strung together, there is no ambiguity between what numbers start and end where. can anyone think of why not to add such a feature? i figured it would already exist, as it is so obviously needed in my mind... thanks for the consideration regards, kent > > regards, > ---------------------- > K.Kawaguchi > E-Mail: k-kawa@bigfoot.com
Received on Friday, 28 September 2001 20:10:59 UTC