- From: Martin Duerst <duerst@w3.org>
- Date: Fri, 19 Sep 2003 12:50:00 -0400
- To: John Cowan <jcowan@reutershealth.com>, Francois Yergeau <FYergeau@alis.com>
- Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
In the long term, I think the fundamental problem is with the large number of encodings, not with the encoding identifications. The example of US-ASCII very clearly shows the huge advantages of having a single encoding. This will of course take some time. As one example, N3 just says that it's in UTF-8. For other new formats, that may make sense, too. Regards, Martin. At 12:10 03/09/19 -0400, John Cowan wrote: >Francois Yergeau scripsit: > > > In this respect, yes. All programming languages should provide for charset > > identification of their source files. Alas, none do, AFAIK. > >I almost, but not quite, entirely disagree with this position. > >Rather than having thousands of ad hoc mechanisms for encoding declarations >in each of the thousands of text formats now extant, file systems should have >a convenient mechanism for recording the encoding of each file, and character >processing libraries should have convenient reading and writing operations >that >do the necessary conversions. Otherwise, generic text-processing tools become >impossible, because each tool has to have a vast library that understands the >mechanics of the encoding declaration specific to the format it is trying to >read. That way madness lies.
Received on Friday, 19 September 2003 12:50:06 UTC