Re: host-meta file format comments (draft-nottingham-site-meta-01) from Mark Nottingham on 2009-02-11 (www-talk@w3.org from January to February 2009)

From: Mark Nottingham <mnot@yahoo-inc.com>
Date: Wed, 11 Feb 2009 12:41:13 +1100
To: Thomas Roessler <tlr@w3.org>
Cc: <www-talk@w3.org>, Eran Hammer-Lahav <blade@yahoo-inc.com>, <discuss@apps.ietf.org>
Message-Id: <4C35F5FF-8612-4F8B-810F-7B7C75DC32E2@yahoo-inc.com>

On 11/02/2009, at 12:28 PM, Thomas Roessler wrote:

> On 11 Feb 2009, at 02:18, Mark Nottingham wrote:
>
> [ASCII vs UTF-8]
>
>> OTOH we're talking about a SHOULD here. Maybe it just needs more  
>> careful guidance; i.e., that you should stick to ASCII unless  
>> you're conveying elements for presentation to end users.
>
> Well, one point to consider is how you expect IRIs and IRI  
> references to be represented.
>
> There's one school of thought (more common in the IETF crowd) that  
> says that these should be convereted to ASCII early, and therefore  
> shouldn't occur here.
>
> The other school of thought (more common at W3C) says that they're  
> fine in the places where XML and other document formats have always  
> accepted URIs

IRIs?

> , and therefore should be representable in this spot.
>
> There are some properties of the direction that the IDNA update  
> effort is going into that suggest that the IETF school of thought is  
> less likely to cause interoperability problems.

That's my experience as well. It's very well to say that IRIs should  
be usable everywhere, but they make things substantially more complex,  
and error-prone. For example, I think it was a mistake for Atom to  
specify the use of IRIs everywhere, including as identifiers for  
relation types. However, that's a discussion that still needs to take  
place, and a different draft...

> The other question is what the cost of violating this SHOULD is.   
> Assume that some people have a really good reason to violate an  
> ASCII or ISO-8859-1 SHOULD, and actually go for UTF-8.  You now get  
> mixed character sets in a single metadata file.  I'm not sure that's  
> desirable...
>
> (BTW, are we just going down the rathole of defining yet another tag- 
> value format that's subtly different?  Maybe the spec should just  
> say "use HTTP header format, but with UTF-8", or "use RFC 822, but  
> with UTF-8".)

But that's already a different thing; although arguably HTTP headers  
allow UTF-8 (Roy makes this point regularly and forcefully), the  
impact on existing software isn't clear.

I see two possible paths forward;

1) require ASCII, using encoding where human-viewable content is  
conveyed, or

2) require ASCII, or UTF-8 where human-viewable content is conveyed  
(i.e., only one of those two).

Input?

--
Mark Nottingham

Received on Wednesday, 11 February 2009 01:41:56 UTC