- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Thu, 18 Aug 2005 17:53:08 +0900
- To: uri@w3.org, Paul Hoffman <phoffman@imc.org>, Ted Hardie <hardie@qualcomm.com>
- Cc: Dan Connolly <connolly@w3.org>
Hello Paul, Ted, others, Here is a comment regarding http://www.ietf.org/internet-drafts/draft-hoffman-file-uri-03.txt This draft is listed as AD Evaluation::AD Followup at https://datatracker.ietf.org/public/pidtracker.cgi?command=view_id&dTag=12228&rfc_flag=0 If this comment is late for actual drafting, please consider it as part of IETF Last Call. The draft says: >>>>>>>> 3.4 Character sets and encodings Local file systems sometimes use many different encodings for representing file names. For interoperability sake, it would be preferable for file: URI libraries to translate the native character encoding for file names to and from Unicode. >>>>>>>> This is a start in the right direction, but somewhat unaccurate. I'll list the problems first, and then propose some new text. There are several problems: 1) Some local file systems indeed use many different encodings for representing file names, but on those file systems, transcoding filenames to and from Unicode may be very difficult. The typical example here is Unix/Linux/... At the OS level, file names are byte strings. A user's locale setting (LANG environment variable) defines how there bytes are interpreted as characters. Different user's milages may vary, unless there is a convention that is enforced system-wide. (fortunately, the convention of using UTF-8 for filenames is on the rise, in particular for Linux). 2) "to and from Unicode" is not well defined. UTF-8? UTF-16? UTF-16LE? UTF-16BE? 3) The above paragraph is written in terms of "file: URI libraries", rather than starting from the scheme syntax. Here is proposed replacement text. Any comments welcome! >>>>>>>> 3.4 Character sets and encodings Local file systems use all kinds of specific encodings, and sometimes many different encodings, for representing file and directory names. For interoperability, it is preferable for file: URIs to use UTF-8 [STD63] (percent-encoded when necessary) in accordance with Section 2.5 of [RFC3986] and for compatibility with IRIs [RFC3987]. Applications creating file: URIs should transcode file and directory names to UTF-8. Applications interpreting file: URIs should transcode back to the encoding(s) used by the file system. For file systems where the encoding used cannot be determined with reasonable reliability, the actual byte values used by the file system may have to be directly encoded in the file: URI. >>>>>>>> I can provide some more text talking about specific systems. Regards, Martin.
Received on Thursday, 18 August 2005 08:53:41 UTC