- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 06 May 2004 16:55:55 +0900
- To: "Chris Haynes" <chris@harvington.org.uk>, <public-iri@w3.org>
Hello Chris, I have added a new paragraph to section 7.8 of the IRI spec, reading as follows: >>>>>>>> It is often possible to reduce the effort and dependencies for upgrading to IRIs by using UTF-8 rather than another encoding where there is a free choice of encodings. For example, when setting up a new file-based Web server, using UTF-8 as the encoding of the file names will make the transition to IRIs easier. Likewise, when setting up a new Web form using UTF-8 as the encoding of the form page, the returned query URIs will use UTF-8 as an encoding and will therefore be compatible with IRIs. >>>>>>>> This is to cover the issue queryclarify-16, and it also gives some more information with regards to UTF-8. When this section was originally written (which was one of the very early drafts, if I remember correctly), browsers and other clients that understood UTF-8 were not very widely available, so this advice was not appropriate, but this has clearly changed in the meantime. I have marked this issue as tentatively closed. Regards, Martin. At 01:19 03/06/28 -0400, Martin Duerst wrote: >Hello Chris, > >Many thanks for your comments on the IRI spec. > >I have noted your issue as: >http://www.w3.org/International/iri-edit#queryclarify-16 > >More explanations below. > >At 23:21 03/06/26 +0100, Chris Haynes wrote: > >>Dear Martin / Michel, >> >>I'm looking at draft-duerst-iri-04 from the viewpoint of a provider of >>web server technology. I'm trying to understand the likely migration >>path to the use of IRIs, and I'm concerned that there's a gap I don't >>see being filled. >> >>It may well be that filling the gap is outside the scope of your >>Internet Draft, but unless the gap is filled, I fear there may be a >>_long_ delay before IRIs are adopted where they are most needed. >> >>Your section 7.8, Upgrading Strategy, contains some useful thoughts / >>advice, which I have summarised to myself as "Don't put in an >>IRI-aware server until all the resources on the site(s) you serve are >>published in IRI". >> >>However that's not the problem that concerns me. >> >>I'm concerned about the encoding of HTTP GET query strings, typically >>carrying text inserted by a user into a browser's form. > >You are right that section 7.8 does not address query strings, >and that it doesn't say so clearly, and that there is otherwise >not too much about how query strings are supposed to work. >I have noted this specific aspect of your mail as an issue, >and will try to update the draft accordingly. > > >>Assume below that "I" am the developer of a web server. (I'm not, but >>I advise someone who is). >> >>I want to support IRIs as soon as possible. I know that 'out there' >>are many different makes and releases of browsers; I have no control >>over them. >> >>As is well known, there is no mechanism in RFCs 2396 / 2616 for >>indicating the encoding associated with any %hh octet-triplets in >>URIs. > >Agreed. > > >>Unless I've missed something, your draft implies that user agents >>(browsers) may perform IRI to URI conversion, so that 'my' server sees >>an RFC 2396-conformant URI. > >Well, they actually have to do this conversion, because HTTP >does not allow anything else than an URI in the request. > > >>How do I know it is was originally an IRI and that I should apply the >>reverse conversions of your section 3.2 before extracting the query >>name-value pairs? > >You don't. Equally well, you don't know whether the name/value >pairs were in iso-8859-1 (Latin-1), or shift-jis, or whatever. >HTTP does not help you there at all. > > >>The problem is not 'academic', the vast majority of browser requests >>received today which have %hh triplets used encodings other than >>UTF-8, and these will continue to arrive for the next 20-or-more >>years. > >Well, for query parts, you actually have quite some control over >what encoding you get the query part back. Already since a few years, >browsers send back the query part in the encoding that they received >the page in. This works quite well. So if you want to have any >idea of what you get back from a browser, you have to know how >you send out your pages. And if you use UTF-8 for your pages, >then you get three main benefits when compared to other encodings: >- UTF-8 can handle the widest range of characters >- UTF-8 will bring your GET request in line with IRIs >- UTF-8 can be checked with very high reliability > >For more information, please also see the Q&A page that we put up >recently: >http://www.w3.org/International/questions/qa-forms-utf-8.html > > >>You may well answer that the way IRIs are to be applied is to be >>scheme-dependent; the problem/opportunity is for the HTTP RFC2616++ >>community to address. > >Well, part of it could be addressed scheme-by-scheme. For example, >a new scheme could require that only UTF-8 be used in the query >part. It can also be addressed by other technologies, for example, >XForms, which requires the use of UTF-8 in the query part of GET >requests (see http://www.w3.org/TR/xforms/slice11.html#serialize-urlencode). > > >>I would feel *far* more comfortable if I knew that they were aware of >>this and if there were draft proposals visible on this list and being >>checked for feasibility and for 'compatibility' with your drafts. >>I've seen no evidence for this, and you don't appear to >>cross-reference any related HTTP activity in your draft. >> >> >>It is not beyond the bounds of possibility, for example, that the HTTP >>community might conclude that they cannot provide IRI support unless >>your RFC-to-be includes some kind of marker or syntactic construction >>within "URIs which were converted from IRIs" which explicitly >>identifies them as such. >> >>In other words it might be found that all IRIs MUST be mapped into >>some character sequence which IS NOT a 'legal' URI (by the current >>RFC2396), so that the receiver knows that the reverse process of your >>section 3.2 MUST be applied. > >This would mean choosing a different escape convention. >We considered this years ago, but decided against it. >Using something that is illegal in an URI would not have >worked, and would still not work, with the current infra- >structure. > > >>There are other approaches the HTTP community could take, which >>_would_ be compatible with your current draft, (and I have my own >>candidate solution), but surely there should at least be some kind of >>'existence proof' or 'feasibility study' by which they agree that they >>_can_ work with your proposals before they are finalized? > >I hope what I have explained above is enough of an 'existence proof'. > >Please tell me if you don't think so. > >Regards, Martin. > > >>Without some kind of 'roadmap' for HTTP use of IRIs I don't see how >>anyone can pass final judgement on your draft. >> >>Please reassure me by telling me I'm an idiot for not knowing about >>XXX or not reading YYY. >> >>Regards, >> >>Chris Haynes
Received on Thursday, 6 May 2004 04:47:44 UTC