Re: IRIs, IDNAbis, and HTTP from Julian Reschke on 2008-03-13 (ietf-http-wg@w3.org from January to March 2008)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 13 Mar 2008 14:47:52 +0100
To: Brian Smith <brian@briansmith.org>
CC: ietf-http-wg@w3.org
Message-ID: <47D93088.9040501@gmx.de>
Brian Smith wrote:
>> New protocols are supposed to support minimally UTF-8 as per 
>> RFC 2277.  That is not the same as "support IRIs", IRIs can 
>> be in any charset, not only UTF-8.
> 
> Right, but I understand it to mean that a new protocol that supports
> URIs must support UTF-8 in URIs, and the only (proposed) standard for
> UTF-8 in URIs is RFC 3987.

Well, no. URIs are all ASCII. There's is no such thing as "Unicode 
support in URIs". These are called "IRIs".

> Also, instead of paraphrasing 2277, let's just directly quote the parts
> that apply to HTTPbis:
> 
> "Lack of an ability to use UTF-8 is a violation of this policy; such a
> violation would need a variance procedure ([BCP9] section 9) with clear
> and solid justification in the protocol specification document before
> being entered into or advanced upon the standards track."
> 
> "For existing protocols or protocols that move data from existing
> datastores, support of other charsets, or even using a default other
> than UTF-8, may be a requirement. This is acceptable, but UTF-8 support
> MUST be possible."
> 
> "In documents that deal with internationalization issues at all, a
> synopsis of the approaches chosen for internationalization SHOULD be
> collected into a section called 'Internationalization considerations',
> and placed next to the Security Considerations section."

All nice in theory, but it hasn't been done in RFC2616.

>> HTTP is no "new" protocol, like mail or news:  2821bis and 
>> 2822upd and FWIW RFC.usefor-usefor don't "violate" any IETF 
>> policy.  But atom and xmpp were new, a different situation.
> 
> RFC 2277 applies to any updates to an existing protocol, as far as I can
> tell.

I don't see how it could apply to that.

> ...
>> HTTP at the moment allows Latin-1, do you really want to 
>> support the proper subset of all IRIs limited to Latin-1, for 
>> the purpose of HTTP Link: header fields ?  
> 
>> When "keeping Latin-1" is a showstopper, then introducing 
>> IRIs in 2616bis would be a clear "1F" scenario.  You need a 
>> new HTTP version number for this, a restart at PS, and a new 
>> WG Charter.
> 
> I am not suggesting that HTTP 1.1 should switch from Latin-1 to UTF-8.
> But, I do think that HTTPbis should explain how IRIs are to be used
> correctly in HTTP (via URI-IRI conversion, IDNA, etc.). And, I think

HTTP uses URIs, not IRIs.

That being said, it may be necessary to state a few things about HTTP 
IRIs, but that would be in a separate document. Remember our charter?

> that HTTPbis should explain how to encode UTF-8 text in newly registered
> header fields. The de-facto mechanism for this, used by Atom and WebDAV,
> is percent-encoded UTF-8.

Note: one instance in WebDAV Delta-V, one in AtomPub.

Are you saying httpbis should recommend that for new headers? I'm not 
against it, but it sounds like something for an update to the HTTP 
header registry.

>>> I personally believe it is wrong to create new standards 
>>> where things may be named in European languages but not
>> in non-European languages.
>>
>> Strong ACK, let's drop the Latin-1 cruft, and limit 2616bis 
>> to US-ASCII and URIs for now.  HTTP/1.2 is free to tackle 
>> UTF-8, and RFC 3987 offers a strict STD 66 URI for any IRI.
> 
> You seem to know a lot more about IETF policy than me, but I don't see
> how it is possible to defer the internationalization considerations of
> HTTP any further, while keeping HTTP on the standards track.

I fear it's the other way around. If we want to keep it on the standards 
track, we can't make any incompatible changes.

BR, Julian
Received on Thursday, 13 March 2008 14:15:29 UTC