- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 24 Feb 2022 08:42:17 -0700
- To: Steven Pemberton <steven.pemberton@cwi.nl>
- Cc: public-ixml@w3.org
Steven Pemberton writes: > The spec says: > A range matches any character in the range from the start > character to the end, inclusive, using the Unicode ordering > > It doesn't require the start character to be earlier in the ordering > than the end character. This means that > > ["z"-"a"] > is the same as > [ ] > > Do we care? Should it be an error, or a warning? I think a warning makes sense only if there is some meaning that can be attached to the range as specified. We can (a) allow the arguments in either order and say that the range matches any character between from the two points specified, inclusive; (b) say that it matches the starting code point, the ending code point, and any characters matching code points between the two (so ["z" - "a"] is equivalent to ["za"]); (c) invent some other meaning for ["z"-"a"]; or (d) call it an error. I don't know off hand what other regular expression notations people may be familiar with do. Looking it up, I find that XSD defines the meaning of a range this way: A ·character range· in the form s-e identifies the set of characters with UCS code points greater than or equal to the code point of s, but not greater than the code point of e. This seems to fall in class (c). Since no characters have UCS code points p with (p ≥ 97) ∧ (p ≤ 122), that means that in XSD regular expressions (and, I guess, XPath 3 regular expressions), [z-a] matches nothing and is thus equivalent to ixml []. I am agnostic, but in the abstract I would lean towards (a) or (d). Since a processor has to check the order either way, I don't think (a) imposes any new cost on the processor, and it does make ixml processors less fussy. (And using Earley parsing pretty much says off the bat that high performance in parsing is not a goal for ixml.) But unless we find some regular expression syntax that is reasonably widely used that uses interpretation (a), I think the principle of least surprise would lead us to (d). -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Thursday, 24 February 2022 15:42:41 UTC