Re: IRIs, IDNAbis, and HTTP

Brian Smith wrote:

> I understand it to mean that a new protocol that supports URIs
> must support UTF-8 in URIs, and the only (proposed) standard
> for UTF-8 in URIs is RFC 3987.

That is not the case, RFC 3986 is a full Internet Standard, and
it doesn't say "URIs must support UTF-8 in URIs".  Likewise new
protocols supporting mail addresses are not forced to support
EAI.  <recycle> RFC 3987 offers a working STD 66 URIs for any
valid IRI in any charset </recycle>  Better don't get me started
about 'legacy enhanced IRIs' <eg>

> just directly quote the parts that apply to HTTPbis:

That is simple, <quote> </quote>.  HTTPbis is no "new" protocol,
it is based on RFC 2068, and 2068 < 2277.  Similar 2821bis is
no "new" protocol, it's based on older memos.  To some degree
you can do "any charset", e.g., using the 8BITMIME extension of
SMTP (IIRC a SHOULD), or using B64/QP CTEs in MIME bodies, or
using UTF-8 (among others) directly in HTTP bodies.

>| In documents that deal with internationalization issues at all,
>| a synopsis of the approaches chosen for internationalization
>| SHOULD be collected into a section called 'Internationalization
>| considerations', and placed next to the Security Considerations
>| section.

Okay, that is something 2616bis SHOULD do.  They clubbed me on
the SMTP list when I proposed a mere informative reference to
EAI in the I18N considerations of the "email-arch" memo.  

> RFC 2277 applies to any updates to an existing protocol, as
> far as I can tell.

It talks about UTF-8 "for all text".  We can ask Harald what
that precisely means, my first guess is "SDU" (body), not the
complete "PDU" (header + body).  And HTTP is even mentioned in
chapter 3.2 of BCP 15.

> You don't need to do the IDNA transformation for link
> relations, because you are not resolving the hostname of
> the IRI of the link relation. 

We don't need any IRI magic at all if we just follow Roy's and
Julian's proposal.  

> I do think that HTTPbis should explain how IRIs are to be
> used correctly in HTTP (via URI-IRI conversion, IDNA, etc.).

We cold say that implementors of RFC 3987 URI-representations
of IRIs have to implement RFC 3987, not some other algorithm
they saw in say the XML spec., but I guess that is obvious.

> I think that HTTPbis should explain how to encode UTF-8
> text in newly registered header fields. The de-facto 
> mechanism for this, used by Atom and WebDAV, is percent-
> encoded UTF-8.

For a draft standard MIME RFC 2047 comes to mind, for a BCP 
one of the two mechanisms recommended in BCP 113 (RFC 5137).
BCP 113 says that using UTF-8 is typically a bad idea when
looking for an ASCII-compatible encoding.  I'm not hot about
what to use in 2616bis, if anything, but if it ends up in a
single case remotely requiring IDNAbis punycode I scream :-)

For EAI they test the UTF-8 waters with *experimental* RFCs,
2616bis will be a draft standard or better if all goes well.

> You seem to know a lot more about IETF policy than me

NAK, but FWIW Harald was the USEFOR WG Chair when what is 
now RFC.usefor-usefor was Last Called and approved.  He did
not insist on allowing UTF-8 in NetNews header fields, and
several years of the USEFOR WG were wasted to pull this 
feature from an earlier set of Usefor drafts, because it
would break the complete installed base.

Of course RFC.usefor-usefor has I18N considerations, and
you can use UTF-8 in NetNews roughly in the same way (MIME)
as in mail, or roughly as in HTTP (NetNews is 8bit clean):

But NOT "as is" in header fields (NetNews uses RFC 2231, a
successor of RFC 2047 also supporting language tags, for
this purpose).  Another problem with percent-encoded UTF-8,
it offers no indication of language tags (BCP 47).

 Frank

Received on Thursday, 13 March 2008 15:27:18 UTC