- From: Glenn Maynard <glenn@zewt.org>
- Date: Wed, 18 Apr 2012 17:34:12 -0500
On Wed, Apr 18, 2012 at 12:12 PM, Anne van Kesteren <annevk at opera.com>wrote: > "If code point is equal or greater than lower boundary" is more naturally >> "greater than or equal to" (and "less than or equal to"). That said, >> this would be much clearer with interval syntax: >> >> "If code point is in the range [*lower boundary*, 0x10FFFF] and is not in >> >> the range [0xD800, 0xDFFF], emit code point (and continue)." >> >> which I think is easier to read, and also makes it clear that the "0xD800 >> to 0xDFFF" is a closed interval (0xD800 and 0xDFFF are included). >> > > Then we'd first have to introduce interval syntax to the English language. > We could do that I suppose in the Terminology section if you think it would > be better. It would also apply to http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#index-gb18030-code-point, and it could apply to "select" ranges (eg. 7.1 step 5: "[0,0x7f]"). Maybe it's not enough to be worth figuring out how to define it. Encoding form data, at least, doesn't abort on the first error; any >> unrepresentable codepoints are encoded as as &x1234;. (It would sure be >> nice if encoding to non-Unicode-based encodings would just *always* use >> that syntax for non-ASCII, so the encoders could be dropped, but I guess >> that would trigger bugs in pages that are currently masked...) Is there >> any encoding path in browsers that does give up on the first error? >> > > It has been proposed for the API. > > And in URLs you do not get "&#...;" (though in WebKit you do) but you get > "?" (IE at the network layer, Opera earlier on) or the utf-8 representation > (Gecko is totally weird). > I was testing with POST, which (at least in Gecko) uses HTML escapes for unrepresentable characters. (It would be pretty neat if that could be changed to *always* using HTML escapes for non-ASCII, except when encoding to UTF-8, since that's not introducing anything new--you can already receive &x1234; escapes in POST data--and it would alleviate the "form submit encoding depends on the source page's encoding" problem. I guess this must break pages somehow, or vendors would have done this long ago.) -- Glenn Maynard
Received on Wednesday, 18 April 2012 15:34:12 UTC