Re: host-meta file format comments (draft-nottingham-site-meta-01) from Thomas Roessler on 2009-02-11 (www-talk@w3.org from January to February 2009)

From: Thomas Roessler <tlr@w3.org>
Date: Wed, 11 Feb 2009 02:28:27 +0100
To: Mark Nottingham <mnot@yahoo-inc.com>
Cc: <www-talk@w3.org>, Eran Hammer-Lahav <blade@yahoo-inc.com>, <discuss@apps.ietf.org>
Message-Id: <252CD966-15AD-4709-806F-83FBE5300A5F@w3.org>

On 11 Feb 2009, at 02:18, Mark Nottingham wrote:

[ASCII vs UTF-8]

> OTOH we're talking about a SHOULD here. Maybe it just needs more  
> careful guidance; i.e., that you should stick to ASCII unless you're  
> conveying elements for presentation to end users.

Well, one point to consider is how you expect IRIs and IRI references  
to be represented.

There's one school of thought (more common in the IETF crowd) that  
says that these should be convereted to ASCII early, and therefore  
shouldn't occur here.

The other school of thought (more common at W3C) says that they're  
fine in the places where XML and other document formats have always  
accepted URIs, and therefore should be representable in this spot.

There are some properties of the direction that the IDNA update effort  
is going into that suggest that the IETF school of thought is less  
likely to cause interoperability problems.

The other question is what the cost of violating this SHOULD is.   
Assume that some people have a really good reason to violate an ASCII  
or ISO-8859-1 SHOULD, and actually go for UTF-8.  You now get mixed  
character sets in a single metadata file.  I'm not sure that's  
desirable...

(BTW, are we just going down the rathole of defining yet another tag- 
value format that's subtly different?  Maybe the spec should just say  
"use HTTP header format, but with UTF-8", or "use RFC 822, but with  
UTF-8".)

--
Thomas Roessler, W3C  <tlr@w3.org>

Received on Wednesday, 11 February 2009 01:28:37 UTC