- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Fri, 19 Sep 2003 20:55:39 +0200
- To: John Cowan <jcowan@reutershealth.com>
- Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
* John Cowan wrote: >Rather than having thousands of ad hoc mechanisms for encoding declarations >in each of the thousands of text formats now extant, file systems should have >a convenient mechanism for recording the encoding of each file, and character >processing libraries should have convenient reading and writing operations that >do the necessary conversions. Impractical. File systems commonly do not support encoding such information and even if they did, this would cause interoperability problems with file systems and protocols which do not provide such means. If you transfer the document using FTP to your web server the information is lost and the document will break. Further, file system information is typically almost invisible to authors and would thus have the same problem as the charset parameter. If I edit a document in an XML unaware text editor, change the encoding declaration and some text nodes and save the file, file system and encoding declaration are likely to contradict each other and the document would break. You are basically suggesting to change all file systems and software that interacts with it and expect everyone to upgrade the software and the file system information of all documents. If an applicable solution may go this far, you should rather suggest to outlaw all non-Unicode encodings, much simpler, more consistent and more interoperable. This would also work if the text is not stored in the file system but rather generated by software, something your solution does not consider. >Otherwise, generic text-processing tools become impossible, They are impossible today. >because each tool has to have a vast library that understands the >mechanics of the encoding declaration specific to the format it is trying to >read. They are not trying to read the format, they are trying to read byte streams as character streams. If they are trying to read the format, they have to support that format anyway, including mechanisms to determine the character encoding. If you consider HTTP a file system, it already implements your solution; all text is identified using text/* types and either the file system provides encoding information (charset parameter) or text processors are required to treat the document as ISO-8859-1 encoded. Text processors would actually only get character streams from the HTTP implementation and would not have to worry about character encodings and stuff. Does it work? No. Especially not because W3C publishes Recommendations that make it impossible to write conforming HTTP implementations. That way madness lies.
Received on Friday, 19 September 2003 14:58:54 UTC