- From: Glenn Adams <glenn@stonehand.com>
- Date: Wed, 2 Aug 95 23:51:33 -0400
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: gtn@ebt.com, glenn@stonehand.com, html-wg@oclc.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
From: Larry Masinter <masinter@parc.xerox.com> Date: Wed, 2 Aug 1995 19:37:40 PDT 'xxxx' are characters *NOT* in the native character set of the file system, but rather in whatever transcription of that character set is made available by the FTP server. OK. Say the transcription system used by both the FTP client and server says: (1) if an octet in the local encoding of a file name is in positions 0x20 - 0x24 or 0x26 - 0x7E of US-ASCII, then use that octet; (2) if an octet in the local encoding of a file name is in position 0x25 of US-ASCII (i.e., '%'), then use "%%" (3) otherwise, use %XX for each octet in the encoded file name (in big endian order), where XX is the hexidecimal value of each such octet. Now, say I am a Chinese version of Windows/NT using Unicode and I ask you for: RETR E-W%0B%00.%00T%00X%00T That is, in Unicode I have the following encoding of my file name: 4E2D 570B 002E 0054 0058 0054 Say you are a Taiwanese server using the BIG5 character set for you file names. How do you interpret my request? Do you interpret it as a BIG5 string? If so, then you think I just asked for the file "E-W??.?T?X?T" (assuming for a moment that you don't throw up on NUL and you interpret NUL and other C0 escapes as '?'. Even if both systems are using the same transcription system, we are still hosed because you have no idea of how to interpret the decoded results since you don't know what character set encoding applies. If, on the other hand, you know that Unicode was the source encoding, then, at least you know you shouldn't interpret it as a BIG5 string. You may respond: sorry, I don't grok Unicode path names. Or, if you have a Unicode -> BIG5 translation table on hand, you could correctly translate it to the corresponding BIG5 encoding: %A4%A4%B0%EA.TXT. On the other hand, if you do misinterpret the original file name as a BIG5 string, then you are going to say: 550 E-W??.?T?X?T: No such file OR directory. And I'm going to be clueless about what went wrong. If the protocol doesn't communicate this information, then who/what will? Regards, Glenn
Received on Wednesday, 2 August 1995 20:54:15 UTC