- From: John Lumley <john@saxonica.com>
- Date: Thu, 24 Feb 2022 15:46:56 +0000
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: Steven Pemberton <steven.pemberton@cwi.nl>, public-ixml@w3.org
- Message-Id: <FEEDED94-4235-4314-BA2A-CA3D4ED09FF0@saxonica.com>
In XPath a statement 45 to 2 is not considered an error, though this isn’t as complex as a case in a grammar Sent from my iPad > On 24 Feb 2022, at 15:42, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote: > > > Steven Pemberton writes: > >> The spec says: >> A range matches any character in the range from the start >> character to the end, inclusive, using the Unicode ordering >> >> It doesn't require the start character to be earlier in the ordering >> than the end character. This means that >> >> ["z"-"a"] >> is the same as >> [ ] >> >> Do we care? Should it be an error, or a warning? > > I think a warning makes sense only if there is some meaning that can be > attached to the range as specified. We can > > (a) allow the arguments in either order and say that the range matches any > character between from the two points specified, inclusive; > > (b) say that it matches the starting code point, the ending code point, > and any characters matching code points between the two (so ["z" - > "a"] is equivalent to ["za"]); > > (c) invent some other meaning for ["z"-"a"]; or > > (d) call it an error. > > I don't know off hand what other regular expression notations people may > be familiar with do. Looking it up, I find that XSD defines the meaning > of a range this way: > > A ·character range· in the form s-e identifies the set of characters > with UCS code points greater than or equal to the code point of s, > but not greater than the code point of e. > > This seems to fall in class (c). Since no characters have UCS code > points p with (p ≥ 97) ∧ (p ≤ 122), that means that in XSD regular > expressions (and, I guess, XPath 3 regular expressions), [z-a] matches > nothing and is thus equivalent to ixml []. > > I am agnostic, but in the abstract I would lean towards (a) or (d). > > Since a processor has to check the order either way, I don't think (a) > imposes any new cost on the processor, and it does make ixml processors > less fussy. (And using Earley parsing pretty much says off the bat that > high performance in parsing is not a goal for ixml.) > > But unless we find some regular expression syntax that is reasonably > widely used that uses interpretation (a), I think the principle of least > surprise would lead us to (d). > > -- > C. M. Sperberg-McQueen > Black Mesa Technologies LLC > http://blackmesatech.com >
Received on Thursday, 24 February 2022 15:47:13 UTC