- From: Mark Nottingham <mnot@yahoo-inc.com>
- Date: Wed, 11 Feb 2009 12:41:13 +1100
- To: Thomas Roessler <tlr@w3.org>
- Cc: <www-talk@w3.org>, Eran Hammer-Lahav <blade@yahoo-inc.com>, <discuss@apps.ietf.org>
On 11/02/2009, at 12:28 PM, Thomas Roessler wrote: > On 11 Feb 2009, at 02:18, Mark Nottingham wrote: > > [ASCII vs UTF-8] > >> OTOH we're talking about a SHOULD here. Maybe it just needs more >> careful guidance; i.e., that you should stick to ASCII unless >> you're conveying elements for presentation to end users. > > Well, one point to consider is how you expect IRIs and IRI > references to be represented. > > There's one school of thought (more common in the IETF crowd) that > says that these should be convereted to ASCII early, and therefore > shouldn't occur here. > > The other school of thought (more common at W3C) says that they're > fine in the places where XML and other document formats have always > accepted URIs IRIs? > , and therefore should be representable in this spot. > > There are some properties of the direction that the IDNA update > effort is going into that suggest that the IETF school of thought is > less likely to cause interoperability problems. That's my experience as well. It's very well to say that IRIs should be usable everywhere, but they make things substantially more complex, and error-prone. For example, I think it was a mistake for Atom to specify the use of IRIs everywhere, including as identifiers for relation types. However, that's a discussion that still needs to take place, and a different draft... > The other question is what the cost of violating this SHOULD is. > Assume that some people have a really good reason to violate an > ASCII or ISO-8859-1 SHOULD, and actually go for UTF-8. You now get > mixed character sets in a single metadata file. I'm not sure that's > desirable... > > (BTW, are we just going down the rathole of defining yet another tag- > value format that's subtly different? Maybe the spec should just > say "use HTTP header format, but with UTF-8", or "use RFC 822, but > with UTF-8".) But that's already a different thing; although arguably HTTP headers allow UTF-8 (Roy makes this point regularly and forcefully), the impact on existing software isn't clear. I see two possible paths forward; 1) require ASCII, using encoding where human-viewable content is conveyed, or 2) require ASCII, or UTF-8 where human-viewable content is conveyed (i.e., only one of those two). Input? -- Mark Nottingham
Received on Wednesday, 11 February 2009 01:41:56 UTC