W3C home > Mailing lists > Public > whatwg@whatwg.org > April 2012

[whatwg] Encoding Standard (mostly complete)

From: Anne van Kesteren <annevk@opera.com>
Date: Wed, 18 Apr 2012 19:12:46 +0200
Message-ID: <op.wcy0rkn264w2qv@annevk-macbookpro.local>
On Wed, 18 Apr 2012 15:40:33 +0200, Glenn Maynard <glenn at zewt.org> wrote:
> "This is a decoder error" seems odd; it's descriptive language ("this  
> thing must be made true") rather than declarative ("do this thing").   
> I'd suggest the declarative language "Emit a decoder error" and "Emit an  
> encoder error".

Yes. Awesome suggestion implemented.


> "If code point is equal or greater than lower boundary" is more naturally
> "greater than or equal to" (and "less than or equal to").  That said,  
> this would be much clearer with interval syntax:
>
> "If code point is in the range [*lower boundary*, 0x10FFFF] and is not in
> the range [0xD800, 0xDFFF], emit code point (and continue)."
>
> which I think is easier to read, and also makes it clear that the "0xD800
> to 0xDFFF" is a closed interval (0xD800 and 0xDFFF are included).

Then we'd first have to introduce interval syntax to the English language.  
We could do that I suppose in the Terminology section if you think it  
would be better.


>> An encoder contains one or more encoder error points. Unless stated
>> otherwise the encoder is terminated at that point.
>
> Encoding form data, at least, doesn't abort on the first error; any
> unrepresentable codepoints are encoded as as &x1234;.  (It would sure be
> nice if encoding to non-Unicode-based encodings would just *always* use
> that syntax for non-ASCII, so the encoders could be dropped, but I guess
> that would trigger bugs in pages that are currently masked...)  Is there
> any encoding path in browsers that does give up on the first error?

It has been proposed for the API.

And in URLs you do not get "&#...;" (though in WebKit you do) but you get  
"?" (IE at the network layer, Opera earlier on) or the utf-8  
representation (Gecko is totally weird).

Maybe we should align URLs with <form> here and use "&#...;" throughout if  
that is compatible with content. Probably deserves a a discussion in its  
own thread.

I do not know any cases beyond URLs, <form>, and the proposed API that  
require an encoder in the platform.


-- 
Anne van Kesteren
http://annevankesteren.nl/
Received on Wednesday, 18 April 2012 10:12:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 30 January 2013 18:48:07 GMT