Re: [css3-selectors] Selectors level 3 - Lexical scanner error

Le 09/04/2013 18:24, Tab Atkins Jr. a écrit :
> On Thu, Apr 4, 2013 at 3:30 AM, Bjoern Hoehrmann<derhoermi@gmx.net>  wrote:
>> * Jean-Jacques Solari wrote:
>>> The most likely place where you would be finding "\377" is in:
>>>
>>> …
>>> nonascii[^\Ø-\177]
>>> …
>>>
>>> But one reads "\177", not "\377", and it is even less to be read twice
>>> in the tokenization as advertised in the paragraph.
>>
>> That paragraph is obviously mistaken considering `377` does not appear
>> in the document anywhere else, but the `177` is not meant as `377`. I am
>> not sure where the `Ø` is from, in the Recommendation it is
>>
>>    nonascii  [^\0-\177]
>>
>> which is anything but 0x00 .. 0x7F, in other words, 0x80 .. 0xFF if the
>> maximum value is 0xFF (0o377 in octal). It can't be 0o377 because then
>> the set would be empty (anything but <minimum> ... <maximum>).
> I was*wondering*  about that some time ago.  I think you're right that
> it's just a persistent typo/misunderstanding for \177, unless someone
> can come up with a convincing argument for why "ASCII" is considered
> to extend all the way to U+00ff, when it's normally considered a 7-bit
> encoding, and thus goes only to U+007f.


Note that the ^ inside [] in a regexp makes the range *exclusive*. In 
this case, [^\0-\177] (anything but 0x00~0x7F) is the same as 
[\200-\377] (0x80~0xFF) if you assume that 0xFF is the maximum possible 
character (which is the case if you work on bytes rather than Unicode 
characters.)

The advantage of an exclusive range when defining "non-ASCII" is that 
you don’t need to care about what is the maximum character is. It looks 
like the grammar in Selectors 3 was changes for that reason, but the 
paragraph about replacing \377 was not removed when it should have.

As to the difference between \237 and \177, it depends on whether you 
define ASCII as ending at U+009F as CSS 2.1 did, or at U+007F is Syntax 
3 and everyone else does.


Selectors 3 should be errata’ed to remove this paragraph:

> The two occurrences of "\377" represent the highest character number
> that current versions of Flex can deal with (decimal 255). They
> should be read as "\4177777" (decimal 1114111), which is the highest
> possible code point in Unicode/ISO-10646. [UNICODE]

-- 
Simon Sapin

Received on Tuesday, 9 April 2013 16:58:25 UTC