Re: Selector parsing: It's easy to hit unexpected unicode-range tokens from Zack Weinberg on 2014-06-30 (www-style@w3.org from June 2014)

From: Zack Weinberg <zackw@panix.com>
Date: Mon, 30 Jun 2014 11:50:10 -0400
To: www-style@w3.org
Message-ID: <53B18732.8080202@panix.com>

On 2014-06-30 10:34 AM, Boris Zbarsky wrote:
>
> It seems to me like either we should not have a separate unicode-range
> token and instead handle unicode ranges on the parser level or we should
> have some sort of special token reprocessing logic in the selector
> parser.  My preference is very much for the former.

As the guy who wrote the @font-face parser in Firefox, I don't see any 
catastrophic problems on that end if unicode-range was simply dropped 
from the tokenizer.  The major headache is that in some cases we will 
have to reinterpret IDENTs and DIMENSIONs (consider e.g. U+FF00-FFFD, 
where the thing after the + is a single DIMENSION token).  Because of 
that, someone is going to need to think very carefully about all the 
possibilities for how the characters after the plus sign get tokenized. 
  And it's a new place where the grammar cares about the absence of 
whitespace.  But I don't think it's ever ambiguous.

zw

Received on Monday, 30 June 2014 15:50:38 UTC