- From: Larry Masinter <LMM@acm.org>
- Date: Thu, 06 May 2004 10:01:40 -0700
- To: uri@w3.org
Re http://lists.w3.org/Archives/Public/uri/2004Apr/0055.html > However, it also implies that character encoding is choosable by > users, and I think that is not the case in many systems. That is, > many systems will only allow an ISO 8859-x encoding for file names. > What you are asking is that the names in those cases must be > re-encoded from the "native" encoding to the standard encoding. > That will (a) induce errors, particularly when people don't bother to > re-encode and (b) increase interoperability. How do people feel about > this balance? I'm not convinced that it will induce errors, especially if decoders try looking up based on local encoding if the UTF-8 decoding doesn't work. So I would suggest filename -> file URL SHOULD reencode from the local encoding to UTF-8 and file URL -> filename SHOULD reencode from UTF-8 -> local encoding, with the possibility that alternate reencoding (or non-encoding) might also be tried. This is also more consistent with IRIs. The translation between file URIs and file paths requires some amount of reencoding anyway, on most systems, just to change the hierarchy delimiter: from "\" (Windows UNC) or ":" (Mac OS 9) to "/". Other notes: I think file://usr/local/bin/ should be file:///usr/local/bin. For 'security considerations': http://cert.uni-stuttgart.de/archive/bugtraq/2001/07/msg00375.html Since there are more comments on 'file' than on the other schemes in this document, perhaps we could pull out the "file" URI out into a separate document? I'm willing to take a run at this, if there's no objection. Larry
Received on Thursday, 6 May 2004 13:02:50 UTC