- From: Ian Hickson <ian@hixie.ch>
- Date: Sat, 7 May 2005 06:26:13 +0000 (UTC)
- To: Martin Duerst <duerst@it.aoyama.ac.jp>
- Cc: Addison Phillips <addison.phillips@quest.com>, Richard Ishida <ishida@w3.org>, www-style@w3.org, public-i18n-core@w3.org
On Wed, 4 May 2005, Martin Duerst wrote: >> >> This would be nigh-on-impossible for UAs to sanely implement because it >> would require character encoding information to be propagated through >> the implementation into parts of the code that are completely unrelated >> to the parsing of the original document (e.g. the DOM code). > > The original character encoding is part of the Infoset. First, CSS has no infoset, and second, the infoset is a highly theoretical construct which in real browsers doesn't exist. So I don't think that this is very relevant to the discussion. :-) > Well, the IRI RFC, as well as the IDN-related RFCs, are still in Draft > Stage. Feedback on what parts are easy or difficult to implement is > definitely welcome. I can immagine that a future version of these specs > would e.g. contain some provision for small devices to not do the > normalization/nameprep, if there is enough feedback in this direction. My main feedback would be: Please don't make anything dependent on the encoding that the URI is found in. Internally, in Web browsers, things are implemented such that the incoming data stream is decoded before it is parsed, after which point the original encoding information is lost, and all the data is in UTF-16 (or sometimes UTF-8). Anything that requires that the original encoding information be used to change the behaviour of later processing will most likely not be reliably implemented. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Saturday, 7 May 2005 06:26:27 UTC