- From: Brian Smith <brian@briansmith.org>
- Date: Fri, 15 Aug 2008 17:33:16 -0500
- To: "'Julian Reschke'" <julian.reschke@gmx.de>
- Cc: <ietf-http-wg@w3.org>
Julian Reschke wrote: > Brian Smith wrote: > > RFC 2231 + UTF-8 is an especially bad interchange format for text > > since it requires over 9 bytes per letter for the vast majority of > > people's native > > Making UTF-16 support mandatory could help her, but I'm not > sure how widespread support for that is (recall I'm trying to > document what several UAs already do and have been doing for > a long time). Will keep this in mind when writing test cases. I don't think UTF-16 support is worthwhile. I think it isn't an issue if the intention is to support only very short, non-prose text like filenames (see below). It seems there is some agreement that HTTP headers should not contain human-oriented text. My concern is that having a separate standard for RFC2231 in HTTP will promote the idea of human-oriented text in headers instead of discouraging it. > > language-sensitive text out of HTTP as much as possible by > > recommending that applications transfer language-sensitive text in > > entity bodies as much as possible. Really, it is only suitable for > > short, language-neutral strings like (file and IRI) path fragments. > > That's something I agree with. For instance, WebDAV doesn't > suffer from these kinds of problems because anything that is > text actually travels in entity bodies as XML. > > That being said, you can't always avoid it, such as in > Content-Disposition or Slug. > Since the primary (only?) use case for RFC2231 in HTTP is the Content-Disposition header, why not just fold this into the spec. that you are writing for Content-Disposition? URI references are already ASCII-encoded IRIs, and Atom's Slug header field already has its own mechanism for handling non-ASCII text. > > languages. Plus, there are no features for language tagging (needed > > for CJK languages), BIDI (needed for middle-eastern languages), or > > accessibility (for users of screen readers). IMO, the best > > thing to do is to keep > > RFC 2231 *does* include language tagging. WRT BIDI I'm no > expert, but I thought Unicode has something to say here? > And could you clarify the accessibility concern please? Language tagging, BIDI, and accessibility features are not really necessary for the specific case of filenames. Those issues come into play when you try to define a general-purpose mechanism for supporting human-oriented text. For example, RFC 2231 only allows a language tag for the entire parameter value, but doesn't provide a means of handling mixed-language text. > > > Nitpicks: > > > > The draft references Unicode 4.0 indirectly through > > RFC3629. It would be better to allow implementations to use > > any later versions, or at least the current version, 5.1. > > Yes, that's a nit, isn't it :-). Yes, but this issue seems to always come up when specifications reference Unicode documents. > > I don't see the point of requiring ISO-8859-1. ISO-8859-1 can only > > encode a very small number of languages that are used by a small > > minority of people (who just happen to be over-represented in > > standards committees). Advocating > > ISO-8859-1 also seems to be the opposite of what was > > discussed at the IETF meeting (AFAICT from the logs). > > I originally want to mandate UTF-8 only, but people pointed > out (rightfully), that any HTTP software already needs to > understand ISO-8859-1, so it really doesn't make a difference. Judging from Roy's response, it looks like software won't have to understand more than ASCII, though they will have to tolerate non-ASCII bytes (presumably, regardless of whether those bytes can be decoded into valid characters in any encoding). Historically, ISO-8859-1 seems to be very difficult for implementers to get right since Windows-1252 and other similar encodings are often sent as ISO-8859-1. - Brian
Received on Friday, 15 August 2008 22:33:58 UTC