Re: draft-hoffman-rfc1738bis-02.txt


> However, it also implies that character encoding is choosable by 
> users, and I think that is not the case in many systems. That is, 
> many systems will only allow an ISO 8859-x encoding for file names. 
> What you are asking is that the names in those cases must be 
> re-encoded from the "native" encoding to the standard encoding.

> That will (a) induce errors, particularly when people don't bother to 
> re-encode and (b) increase interoperability. How do people feel about 
> this balance?

I'm not convinced that it will induce errors, especially if
decoders try looking up based on local encoding if the UTF-8
decoding doesn't work.

So I would suggest

 filename -> file URL  
   SHOULD reencode from the local encoding to UTF-8


  file URL -> filename
     SHOULD reencode from UTF-8 -> local encoding,
    with the possibility that alternate reencoding
    (or non-encoding) might also be tried.

This is also more consistent with IRIs.

The translation between file URIs and file paths requires some
amount of reencoding anyway, on most systems, just to change the
hierarchy delimiter: from "\" (Windows UNC) or ":" (Mac OS 9) to "/".

Other notes:

I think file://usr/local/bin/  should be file:///usr/local/bin.

For 'security considerations':

Since there are more comments on 'file' than on the other
schemes in this document, perhaps we could pull out the
"file" URI out into a separate document?

I'm willing to take a run at this, if there's no objection.


Received on Thursday, 6 May 2004 13:02:50 UTC