RE: IRIs, IDNAbis, and HTTP (was: Reviving HTTP Header Linking: Some code and use-cases)

Frank Ellermann wrote:
> Brian Smith wrote:
>  
> >> Sounds like a good reason for not allowing link relations 
> >> that aren't URIs (or URI references).
>  
> > That is against IETF policy. New standards have to allow the use of 
> > IRIs wherever URIs are allowed. At least, that is what I 
> > was told on the Atom mailing list.

> New protocols are supposed to support minimally UTF-8 as per 
> RFC 2277.  That is not the same as "support IRIs", IRIs can 
> be in any charset, not only UTF-8.

Right, but I understand it to mean that a new protocol that supports
URIs must support UTF-8 in URIs, and the only (proposed) standard for
UTF-8 in URIs is RFC 3987.

Also, instead of paraphrasing 2277, let's just directly quote the parts
that apply to HTTPbis:

"Lack of an ability to use UTF-8 is a violation of this policy; such a
violation would need a variance procedure ([BCP9] section 9) with clear
and solid justification in the protocol specification document before
being entered into or advanced upon the standards track."

"For existing protocols or protocols that move data from existing
datastores, support of other charsets, or even using a default other
than UTF-8, may be a requirement. This is acceptable, but UTF-8 support
MUST be possible."

"In documents that deal with internationalization issues at all, a
synopsis of the approaches chosen for internationalization SHOULD be
collected into a section called 'Internationalization considerations',
and placed next to the Security Considerations section."

> HTTP is no "new" protocol, like mail or news:  2821bis and 
> 2822upd and FWIW RFC.usefor-usefor don't "violate" any IETF 
> policy.  But atom and xmpp were new, a different situation.

RFC 2277 applies to any updates to an existing protocol, as far as I can
tell.

No, that is not what I was saying at all.

> Please note that RFC 3987 is a proposed standard, HTTP is a 
> draft standard.  You'd get a downref, and forcing servers and 
> clients worldwide to do the Unicode 3.2 punycode stunt (until 
> IDNAbis fixes it) is an interoperability nightmare.

You don't need to do the IDNA transformation for link relations, because
you are not resolving the hostname of the IRI of the link relation. 

> HTTP at the moment allows Latin-1, do you really want to 
> support the proper subset of all IRIs limited to Latin-1, for 
> the purpose of HTTP Link: header fields ?  

> When "keeping Latin-1" is a showstopper, then introducing 
> IRIs in 2616bis would be a clear "1F" scenario.  You need a 
> new HTTP version number for this, a restart at PS, and a new 
> WG Charter.

I am not suggesting that HTTP 1.1 should switch from Latin-1 to UTF-8.
But, I do think that HTTPbis should explain how IRIs are to be used
correctly in HTTP (via URI-IRI conversion, IDNA, etc.). And, I think
that HTTPbis should explain how to encode UTF-8 text in newly registered
header fields. The de-facto mechanism for this, used by Atom and WebDAV,
is percent-encoded UTF-8.

> > I personally believe it is wrong to create new standards 
> > where things may be named in European languages but not
> in non-European languages.
> 
> Strong ACK, let's drop the Latin-1 cruft, and limit 2616bis 
> to US-ASCII and URIs for now.  HTTP/1.2 is free to tackle 
> UTF-8, and RFC 3987 offers a strict STD 66 URI for any IRI.

You seem to know a lot more about IETF policy than me, but I don't see
how it is possible to defer the internationalization considerations of
HTTP any further, while keeping HTTP on the standards track.

- Brian

Received on Thursday, 13 March 2008 13:23:07 UTC