- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Tue, 29 Apr 2008 04:54:05 +0200
- To: www-international@w3.org
Erik van der Poel wrote: > I notice that you did not address the query part in your response. IIRC RFC 3987 has no special rules for anything excl. <ihost>, it's always "transform legacy charset to UTF-8 and then percent- encode" to get the equivalent URI. Any magic with say iri= parameters in an <iquery> happens on the server, servers like IRI producers are supposed to know how they can handle any IRI in its URI-equivalent form. The critical part, who supports IDNA, is handled by the producer and the server, the clients and consumers can be obsolete. > Since URIs and IRIs do not have the "accept-charset" that HTML > forms have, the "best practice" would be to use a charset that > can encode all of Unicode (e.g. UTF-8). Yes. But legacy charsets do not need to be a problem. Missing characters can be given as NCRs, they are Unicode by definition in any (X)(HT)ML document. On the KOI8-R test page I have NCRs for Greek characters. "Only" all non-ASCII octets are KOI8-R. In theory user agents can get this right when they know KOI8-R. All is lost if they send the octets to the clipboard "as is" without saying what it is, they better transform KOI8-R and NCRs to UTF-8 before talking with a clipboard. But problems with forms, legacy charsets, and clipboards are no IRI problem, or rather I don't see where IRIs make this worse. > The &#NNNNN; syntax has the advantage that it is consistent > with de facto HTML form handling. (The server does not know > whether the client started with an HTML form or an href.) ACK, I normally prefer US-ASCII with NCRs for very limited uses of non-ASCII, but that is only because I rarely need non-ASCII, no option in most languages. > The IRIbis author(s) may wish to make this part optional (e.g. > a profile), so that applications other than HTML can still opt > for the "clean" solution (query part in escaped UTF-8). AFAIK there is no standard for query parts, name=value pairs separated by "&" are only a popular convention, not mandated in RFC 3986. The required syntax is "begins after first ?", some characters like space, "[", "<", ">", and "]" cannot occur in a query part (percent-encoded is okay, a raw "?" is also okay), and "#" or the end of the URI, e.g., indicated by ">" , is the end of the query. Anything else *within* the query is free style - for some time folks tried to establish ";" instead of "&" as separator. IMO it would be a bad idea if Martin starts to talk about issues not specified for URIs in 3987bis. The magic of RFC 3987 is that it's straight forward. Admittedly I ignore "legacy IRIs" (a few MAYs) and "IRI comparison" in RFC 3987. All query-part problems are not IRI-problems, they have to be addressed elsewhere, not 3987bis, they already existed before. Frank
Received on Tuesday, 29 April 2008 02:52:15 UTC