RE: IRIs, IDNAbis, and HTTP

Julian Reschke wrote:
> Brian Smith wrote:
> > "Lack of an ability to use UTF-8 is a violation of this 
> > policy; such a violation would need a variance procedure
> > ([BCP9] section 9) with clear and solid justification
> > in the protocol specification document before being
> > entered into or advanced upon the standards track."
> > 
> > "For existing protocols or protocols that move data
> > from existing datastores, support of other charsets,
> > or even using a default other than UTF-8, may be a
> > requirement. This is acceptable, but UTF-8 support
> > MUST be possible."
> 
> All nice in theory, but it hasn't been done in RFC2616.

The purpose of HTTPbis is to fix problems with RFC2616. That is one of
the problems that needs to be fixed.

> >> HTTP is no "new" protocol, like mail or news:  2821bis and 2822upd 
> >> and FWIW RFC.usefor-usefor don't "violate" any IETF 
> >> policy.  But atom and xmpp were new, a different situation.
> > 
> > RFC 2277 applies to any updates to an existing protocol, as 
> > far as I can tell.
> 
> I don't see how it could apply to that.

Please read what I quoted above. HTTP is an existing protocol, so it can
have a default charset other than UTF-8, but "UTF-8 support MUST be
possible." 

> > I am not suggesting that HTTP 1.1 should switch from 
> > Latin-1 to UTF-8. But, I do think that HTTPbis should
> > explain how IRIs are to be used correctly in HTTP
> > (via URI-IRI conversion, IDNA, etc.). And, I think
> 
> HTTP uses URIs, not IRIs.
> 
> That being said, it may be necessary to state a few things 
> about HTTP IRIs, but that would be in a separate document. 
> Remember our charter?

How does RFC 2277 fit into the standardization process. RFC 2277 itself
says "This document is the current policies being applied by the
Internet Engineering Steering Group (IESG) towards the standardization
efforts in the Internet Engineering Task Force (IETF) in order to help
Internet protocols fulfill these requirements." I take that to mean that
the IESG will reject any specification that is not compliant with RFC
2277 as a matter of policy.

> > that HTTPbis should explain how to encode UTF-8 text in newly 
> > registered header fields. The de-facto mechanism for this, used by 
> > Atom and WebDAV, is percent-encoded UTF-8.
> 
> Note: one instance in WebDAV Delta-V, one in AtomPub.
> 
> Are you saying httpbis should recommend that for new headers? 
> I'm not against it, but it sounds like something for an 
> update to the HTTP header registry.

HTTPbis should at least standardize a mechanism for new headers to
support Unicode text. Percent-encoded UTF-8 is one possibility. Or--just
thinking off the top of my head--HTTPbis could allow new headers to
encode UTF-8 text directly in quoted-strings, by starting the quoted
string with the BOM (<EF><BB><BF>, which is "" in Latin-1). 

But, it is totally unacceptable to add the Link header with a
non-Unicode-capable title subfield, it is unacceptable to specify any
new headers that have any human-oriented text that is not Unicode
enabled, and any existing headers that have human-oriented text should
be revised (in the most backwards-compatible way possible) to support
Unicode text.

> > You seem to know a lot more about IETF policy than me, but 
> > I don't see how it is possible to defer the
> > internationalization considerations of HTTP any further,
> > while keeping HTTP on the standards track.
> 
> I fear it's the other way around. If we want to keep it on 
> the standards track, we can't make any incompatible changes.

I agree. That is why I have not suggested any incompatible changes.

Regards,
Brian

Received on Thursday, 13 March 2008 15:01:15 UTC