- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Tue, 19 Apr 2011 17:26:52 -0600
- To: "Costello, Roger L." <costello@mitre.org>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "xmlschema-dev@w3.org" <xmlschema-dev@w3.org>
On Apr 19, 2011, at 2:34 PM, Costello, Roger L. wrote: > Thanks Michael. Very enlightening. > > I'd like to confirm my new understanding. > > Here simpleType "A" is the base of simpleType "B": > > <xs:simpleType name="A"> > <xs:restriction base="xs:string"> > <xs:pattern value="[a-z]{10}" /> > </xs:restriction> > </xs:simpleType> > > <xs:simpleType name="B"> > <xs:restriction base="A"> > <xs:pattern value="[a-z]{20}" /> > </xs:restriction> > </xs:simpleType> > > Here I declare an element, Test, to be of type "B": > > <xs:element name="Test" type="B" /> > > The value of "B" must consist of the letters a-z and the length must be exactly 10 characters AND exactly 20 characters. Clearly that is impossible, so B has no valid value. > > Is that correct thus far? I have not checked to see what any validators say, so I may be missing a step somewhere, but I believe this is correct so far. > > Compare the above two simpleTypes against this simpleType: > > <xs:simpleType name="C"> > <xs:restriction base="xs:string"> > <xs:pattern value="[a-z]{10}" /> > <xs:pattern value="[a-z]{20}" /> > </xs:restriction> > </xs:simpleType> > > I declare an element, Test, to be of type "C": > > <xs:element name="Test" type="C" /> > > The value of "C" must consist of the letters a-z and the length must be exactly 10 characters OR exactly 20 characters. So either of these is valid: > > <Test> abcdefghijabcdefghij</Test> > > <Test> abcdefghij</Test> > > Is that correct? That's what I believe the spec says, yes. > Let's return to the "A" and "B" example. I would like to merge them into a single simpleType. What kind of merger do you have in mind? If you want a value space containing all the values of A and also all the values of B, use a union and specify A and B as its member types. If you want a value space containing all the values that are values of A and also values of B, then you want an empty value space, a type with no instances. > I have learned that simply merging the pattern facets of "A" into "B" to yield "C" is not correct. It would be so nice if I could simply write: > > [a-z]{10} and [a-z]{20} If you want to define a type with no instances, the simplest way is probably to declare a union with no member types. That works in XSD 1.1, though it's disallowed in 1.0 (because the WG felt it made no sense). If you aren't trying to define a type with no instances, then I don't think I understand what you want the expression just given to mean. > > Unfortunately, there is no "and" operator in regex. So, any ideas on how to "and" arbitrary regex expressions? The simplest way to get the AND of two patterns E1 and E2 is to declare a simple type with pattern E1 and then define a second type restricting the first by adding E2. The 1.1 declaration of yearMonthDuration provides a real-world example. The restriction adds the pattern [^DT]* to say that each instance of yearMonthDuration must match the patterns of duration and must also be a string of characters none of which is a D or a T (which rules out any occurrence of the day, month, hour, and minute fields). An alternative method would be to write the regex you want using AND and any other logical operators you need, then use Brzozowski derivatives to calculate a finite-state machine for the language, then use standard methods to calculate a regular expression for the language recognized by the finite state machine. This is a lot of tedious work and the usual methods of generating regexes from FSMs are apt to produce verbose results, but it does get you a single regular expression, if that's what you really really want. I hope this helps. -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Tuesday, 19 April 2011 23:27:18 UTC