Re: URIEquivalence-15: characters in RFC 2396 (was: Re: [Minutes] 27 Jan 2003 TAG teleconf (..., IRIEverywhere-27, ...)) from Stefan Eissing on 2003-02-05 (www-tag@w3.org from February 2003)

From: Stefan Eissing <stefan.eissing@greenbytes.de>
Date: Wed, 5 Feb 2003 16:28:19 +0100
To: Martin Duerst <duerst@w3.org>
Cc: www-tag@w3.org
Message-Id: <748D4D08-391E-11D7-A23E-00039384827E@greenbytes.de>

Martin,

without bothering the least with my shallow understanding of things:

Am Dienstag, 04.02.03, um 23:52 Uhr (Europe/Berlin) schrieb Martin 
Duerst:

>
>> To come back to the one character or three question... '%7e' might be 
>> viewed
>> as 3 "URI Characters"; one "octet"; and one "original character" '~'
>> (maybe).
>
> Yes, exactly. The 'maybe' for '~' is quite appropriate.
> If somebody ran an http server on a computer where people
> still used e.g. the German version of ISO 646
> (see http://www.itscj.ipsj.or.jp/ISO-IR/021.pdf), then
> the original character would be a sharp-s.
>

But if the "%7e" is part of the query, then:

  http://www.w3.org/TR/html4/interact/forms.html#idx-form-8

says that it is encoded US-ASCII.

So, http URIs can be encoded from an arbitrary charset, apart from
the query part?

While HTML4 is not normative for RFC 2396, it certainly reflects a way 
of
thinking about http uri encoding which is quite, uh, widespread nowadays
(in heads and implementations).

If this way of thinking is broken, then I would be interested to know
how a HTTP Server/CGI Util Package/Servlet Container is supposed to
translate a GET on

http://example.org/search?q=a%3d%2561

IMHO, "undefined" is not an acceptable answer.

//Stefan

Received on Wednesday, 5 February 2003 10:28:43 UTC