- From: Alain LaBont/e'/ <alb@sct.gouv.qc.ca>
- Date: Wed, 5 Feb 1997 11:31:25 -0500
- To: iso10646@listproc.hcf.jhu.edu, Unicore <unicore@unicode.org>, Unicode <unicode@unicode.org>, www-international <www-international@w3.org>, HTTP WG <http-wg@cuckoo.hpl.hp.com>, Search <search@mccmedia.com>, ISO10646 <iso10646@listproc.hcf.jhu.edu>
At 09:43 97-02-05 -0500, Misha Wolf wrote: >Chris Pratley wrote: > >>[snip] > >>Our assumption was that UTF-8 was the only Web-safe encoding that was >>reasonably likely to be adopted by browsers in the near future. Is that >>the consensus, or are raw UCS2 encodings being considered actively by >>people on this alias? > >I think it very unlikely that plain 16-bit Unicode will be adopted by >browsers in the next year or two. The two encoding schemes which will >be widely used to encode Unicode Web pages are: > > 1. UTF-8 (see <http://www.reuters.com/unicode/iuc10/x-utf8.html>). > 2. Numeric Character References (see <http://www.reuters.com/unicode/iuc10/x-ncr.html>). > >The second scheme is intriguing as it does not require the use of any >octets over 127 decimal (7F hex). Accordingly, it is legal to to label >such a file as, eg, US-ASCII, ISO-8859-1, X-SJIS, or any other "charset" >which has ASCII as a subset. Browser vendors: Please check your products >against the pages referenced above. > >>[snip] > >Regards, >Misha I do not understand why it is more complicated to use UCS-2 than any other scheme (apart from the little-endian problem, which should be deprecated in the state of the art of the XXIth Century, it is a patch!) The web requires 8- bits-per-octet encoding (thank God! otherwise even UTF-8 would not work) as its default character set is ISO/IEC 8859-1. A wise implementer should implement at least: -Latin 1 -UTF-8 -entity names -UCS-2 (big-endian at least, little-endian as a patch if indicated clearly!) Anyway the logic, one the source data has been normalized, should be the same after all. I am pretty sure nobody uses UTF-8 or even entity names as its canonical processing encoding... That would be a nonsense. But who knows, masochism exists, I know (: Alain LaBonté (version : 8 bits --- (-: ) Alain LaBont/e'/ (version : 7 bits --- )<:= !@#$%?&*()_+-=^~',."!!!)
Received on Wednesday, 5 February 1997 11:29:14 UTC