Migration of HTTP to the use of IRIs from Chris Haynes on 2003-06-26 (public-iri@w3.org from June 2003)

From: Chris Haynes <chris@harvington.org.uk>
Date: Thu, 26 Jun 2003 23:21:25 +0100
To: "Martin Duerst" <duerst@w3.org>, <public-iri@w3.org>
Message-ID: <003a01c33c31$4e370ac0$0200000a@ringo>
Dear Martin / Michel,

I'm looking at draft-duerst-iri-04 from the viewpoint of a provider of
web server technology. I'm trying to understand the likely migration
path to the use of IRIs, and I'm concerned that there's a gap I don't
see being filled.

It may well be that filling the gap is outside the scope of your
Internet Draft, but unless the gap is filled, I fear there may be a
_long_ delay before IRIs are adopted where they are most needed.

Your section 7.8, Upgrading Strategy, contains some useful thoughts /
advice, which I have summarised to myself as "Don't put in an
IRI-aware server until all the resources on the site(s) you serve are
published in IRI".

However that's not the problem that concerns me.

I'm concerned about the encoding of HTTP GET query strings, typically
carrying text inserted by a user into a browser's form.

Assume below that "I" am the developer of a web server. (I'm not, but
I advise someone who is).

I want to support IRIs as soon as possible. I know that 'out there'
are many different makes and releases of browsers; I have no control
over them.

As is well known, there is no mechanism in RFCs 2396 / 2616 for
indicating the encoding associated with any %hh octet-triplets in
URIs.

Unless I've missed something, your draft implies that user agents
(browsers) may perform IRI to URI conversion, so that 'my' server sees
an RFC 2396-conformant URI.

How do I know it is was originally an IRI and that I should apply the
reverse conversions of your section 3.2 before extracting the query
name-value pairs?

The problem is not 'academic', the vast majority of browser requests
received today which have %hh triplets used encodings other than
UTF-8, and these will continue to arrive for the next 20-or-more
years.

You may well answer that the way IRIs are to be applied is to be
scheme-dependent; the problem/opportunity  is for the HTTP RFC2616++
community to address.

I would feel *far* more comfortable if I knew that they were aware of
this and if there were draft proposals visible on this list and being
checked for feasibility and for 'compatibility' with your drafts.
I've seen no evidence for this, and you don't appear to
cross-reference any related HTTP activity in your draft.


It is not beyond the bounds of possibility, for example, that the HTTP
community might conclude that they cannot provide IRI support unless
your RFC-to-be  includes some kind of marker or syntactic construction
within "URIs which were converted from IRIs" which explicitly
identifies them as such.

In other words it might be found that all IRIs MUST be mapped into
some character sequence which IS NOT a 'legal'  URI (by the current
RFC2396), so that the receiver knows that the reverse process of your
section 3.2 MUST be applied.

There are other approaches the HTTP community could take, which
_would_ be compatible with your current draft, (and I have my own
candidate solution), but surely there should at least be some kind of
'existence proof' or 'feasibility study' by which they agree that they
_can_ work with your proposals before they are finalized?

Without some kind of 'roadmap' for HTTP use of IRIs I don't see how
anyone can pass final judgement on your draft.

Please reassure me by telling me I'm an idiot for not knowing about
XXX or not reading YYY.

Regards,

Chris Haynes
Received on Thursday, 26 June 2003 18:29:07 UTC