RE: Factoring out Content-Disposition (i123)

Julian Reschke wrote:
> Frank Ellermann wrote:

> > IMO file name length limits don't belong in 2616bis:
> 
> That's true.
> 
> But in a draft specific to Content-Disposition we may want to 
> consider this. In particular, sending extraordinary long file 
> names (after decoding) may be problematic just for the reason
> of usability in the UA's UI.

I think it is just best to add a note that points out that there may be a
large disparity between the number of characters and the number of bytes to
encode those characters. For example, "มหาวิทยาลัยเชียงใหม่" (Thai for
"Chiang Mai University") becomes:

%E0%B8%A1%E0%B8%AB%E0%B8%B2%E0%B8%A7%E0%B8%B4%E0%B8%97%E0%B8%A2%E0%B8%B2%E0%
B8%A5%E0%B8%B1%E0%B8%A2%E0%B9%80%E0%B8%8A%E0%B8%B5%E0%B8%A2%E0%B8%87%E0%B9%8
3%E0%B8%AB%E0%B8%A1%E0%B9%88

20 characters * 3 octets per character for UTF-8 * 3 bytes for %-escaping =
180 bytes. In UTF-8, each URL-encoded codepoint requires up to 12 bytes, and
each character could require several (up to 18 for NFKD) such byte
sequences. 

Somewhere (in an appendix?) the current behavior of popular UAs should be
documented so that people understand what works now vs. what will
(hopefully) work in the future. In that appendix, it would be prudent to
mention IE's 150-byte limit. For example, "มหาวิทยาลัยเชียงใหม่" is already
way too long for IE (180 bytes), even though it accepts the longer (by
character count) "Chiang Mai University" just fine.

Julian: in your tests you should try out different normalization forms and
non-normalized forms. NTFS doesn't do any normalization at all and I imagine
Linux is the same, but Mac OS X's HFS+ file system converts all filenames to
NFD. Yet most software and specifications (W3C and RFC 3987 in particular)
uses NFC.

Regards,
Brian

Received on Tuesday, 19 August 2008 14:19:10 UTC