- From: Tab Atkins Jr. <jackalmage@gmail.com>
 - Date: Wed, 4 Sep 2013 09:43:15 -0700
 - To: www-style list <www-style@w3.org>
 
Per today's telcon, I need to propose a new regex for the
unicode-range token in 2.1, to bring it in line with the Syntax
definition we agreed on.
For clarity, here's the current regex:
u\+[0-9a-f?]{1,6}(-[0-9a-f]{1,6})?
This properly covers all the sensible unicode-range syntax, but it
also accidentally covers nonsensical ranges like "u+1?3" or
"u+???-500", which can't be interpreted as a range.
Here's a new regex that only covers the syntax we actually want:
(u\+[?]{1,6})|(u\+[0-9a-f]{1}[?]{0,5})|(u\+[0-9a-f]{2}[?]{0,4})|(u\+[0-9a-f]{3}[?]{0,3})|(u\+[0-9a-f]{4}[?]{0,2})|(u\+[0-9a-f]{5}[?]{0,1})|(u\+[0-9a-f]{6})|(u\+[0-9a-f]{1,6}-[0-9a-f]{1,6})
(This regex was contributed by Simon; I was writing a functionally
identical but less clear one earlier.)
Here's a clearer presentation of the regex, if you ignore whitespace:
(u\+[?]{1,6})|
(u\+[0-9a-f]{1}[?]{0,5})|
(u\+[0-9a-f]{2}[?]{0,4})|
(u\+[0-9a-f]{3}[?]{0,3})|
(u\+[0-9a-f]{4}[?]{0,2})|
(u\+[0-9a-f]{5}[?]{0,1})|
(u\+[0-9a-f]{6})|
(u\+[0-9a-f]{1,6}-[0-9a-f]{1,6})
~TJ
Received on Wednesday, 4 September 2013 16:44:02 UTC