- From: Simon Montagu <smontagu@smontagu.org>
- Date: Tue, 14 Jun 2011 16:20:06 +0300
- To: "www-style@w3.org" <www-style@w3.org>
On 06/14/2011 12:38 PM, Mikko Rantalainen wrote: > 2011-06-10 19:52 EEST: Joshua Cranmer: >> On 6/10/2011 9:37 AM, Jack Smiley wrote: >>> 3) Regarding the macro definition for nonascii, why does it go up to >>> octal 237? (what's special about 237?) Why not octal 177 (decimal 127 >>> -- standard ASCII) or octal 377 (decimal 255 -- extended ASCII)? >> Presumably, 238 and above is where you have individually invalid octets >> for UTF-8. > > Isn't anything that has 8th bit set possibly invalid in UTF-8? Octal 177 > / decimal 127 makes more sense if UTF-8 compatibility is the reason for > this limit. Firstly these are Unicode code points, not octets in UTF-8 or any other encoding. See above, "Octal codes refer to ISO 10646". Secondly, the macro *excludes* \0-\237. In other words it includes \240 onwards, i.e. U+00A0 - U+FFFF (presumably no more, since the reference is to ISO/IEC 10646-1:2003, which includes the BMP only).
Received on Tuesday, 14 June 2011 13:20:33 UTC