- From: Ian Hickson <ian@hixie.ch>
- Date: Mon, 5 Dec 2011 23:37:51 +0000 (UTC)
- To: Glenn Adams <glenn@skynav.com>
- cc: WebApps WG <public-webapps@w3.org>
On Mon, 5 Dec 2011, Glenn Adams wrote: > > The problem as I see it is that the current spec text for charset > detection effectively *requires* a browser that does not "support" > UTF-32 to explicitly ignore content metadata that may be correct (if it > specifies UTF-32 as charset param), and further, to explicitly mis-label > such content as UTF-16LE in the case that the first four bytes are FF FE > 00 00. Indeed, the current algorithm requires mis-labelling such content > as UTF-16LE with a confidence of "certain". Yes, this is explicitly intentional. > The current text is also ambiguous with respect to what "support" means > in step (2) of Section 8.2.2.1 of [1]. Which of the following are meant > by "support"? To quote from the terminology section: "The specification uses the term supported when referring to whether a user agent has an implementation capable of decoding the semantics of an external resource." > - recognize with sniffer > - be capable of using directly as internal coding > - be capable of transcoding to internal coding I don't know how to distinguish the latter two in a black-box manner. Either of the latter two is a correct interpretation as far as I can tell. I suppose the current spec could be read such that the user agent could autodetect an unsupported encoding, but that wouldn't be very clever. I guess I can add some text to the spec to make that more obviously bad. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 5 December 2011 23:38:15 UTC