- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Fri, 15 Aug 2008 23:00:59 +0200
- To: Brian Smith <brian@briansmith.org>
- CC: ietf-http-wg@w3.org
Brian Smith wrote: > During the IETF meeting, what was the result of the discussions about > Unicode support in HTTP? Looking at the IRC log, it looked like the > discussion was leaning towards allowing UTF-8 in an otherwise-unencoded form > in headers (applications should start accepting unencoded UTF-8 but should > avoid sending it right now). If that is the way things are going to go, a > general RFC 2231 profile for HTTP seems counterproductive. As far as I recall, we played with the idea, but we were unsure whether this would be possible to do. RFC 2231 already is implemented in two of the big four UAs, and it works around the whole issue, so I think making it easier to use (by clearly stating how it works in HTTP) is a good idea in any case. > RFC 2231 + UTF-8 is an especially bad interchange format for text since it > requires over 9 bytes per letter for the vast majority of people's native Making UTF-16 support mandatory could help her, but I'm not sure how widespread support for that is (recall I'm trying to document what several UAs already do and have been doing for a long time). Will keep this in mind when writing test cases. > languages. Plus, there are no features for language tagging (needed for CJK > languages), BIDI (needed for middle-eastern languages), or accessibility > (for users of screen readers). IMO, the best thing to do is to keep RFC 2231 *does* include language tagging. WRT BIDI I'm no expert, but I thought Unicode has something to say here? And could you clarify the accessibility concern please? > language-sensitive text out of HTTP as much as possible by recommending that > applications transfer language-sensitive text in entity bodies as much as > possible. Really, it is only suitable for short, language-neutral strings > like (file and IRI) path fragments. That's something I agree with. For instance, WebDAV doesn't suffer from these kinds of problems because anything that is text actually travels in entity bodies as XML. That being said, you can't always avoid it, such as in Content-Disposition or Slug. > Nitpicks: > > The draft references Unicode 4.0 indirectly through RFC3629. It would be > better to allow implementations to use any later versions, or at least the > current version, 5.1. Yes, that's a nit, isn't it :-). > I don't see the point of requiring ISO-8859-1. ISO-8859-1 can only encode a > very small number of languages that are used by a small minority of people > (who just happen to be over-represented in standards committees). Advocating > ISO-8859-1 also seems to be the opposite of what was discussed at the IETF > meeting (AFAICT from the logs). I originally want to mandate UTF-8 only, but people pointed out (rightfully), that any HTTP software already needs to understand ISO-8859-1, so it really doesn't make a difference. BR, Julian
Received on Friday, 15 August 2008 21:01:45 UTC