- From: Erik van der Poel <erikv@google.com>
- Date: Tue, 29 Apr 2008 08:24:26 -0700
- To: "Frank Ellermann" <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
- Cc: www-international@w3.org
On Mon, Apr 28, 2008 at 7:54 PM, Frank Ellermann <nobody@xyzzy.claranet.de> wrote: > IIRC RFC 3987 has no special rules for anything excl. <ihost>, > it's always "transform legacy charset to UTF-8 and then percent- > encode" to get the equivalent URI. > > Any magic with say iri= parameters in an <iquery> happens on the > server, servers like IRI producers are supposed to know how they > can handle any IRI in its URI-equivalent form. RFC 3987 does mention related issues. E.g., section 7.8: "Likewise, when a new Web form is set up using UTF-8 as the character encoding of the form page, the returned query URIs will use UTF-8 as the character encoding (unless the user, for whatever reason, changes the character encoding) and will therefore be compatible with IRIs." Section 7.7: "Second, it may include URIs constructed based on character encodings other than UTF-8. These URIs may be produced by user agents that do not conform to this specification and that use legacy character encodings to convert non-ASCII characters to URIs." HTML browsers are an example of "user agents that do not conform to this specification". > > The &#NNNNN; syntax has the advantage that it is consistent > > with de facto HTML form handling. (The server does not know > > whether the client started with an HTML form or an href.) > > ACK, I normally prefer US-ASCII with NCRs for very limited uses > of non-ASCII, but that is only because I rarely need non-ASCII, > no option in most languages. I'm not sure whether we are communicating here. I'm talking about URIs that are sent from the client to a server, whether that is the result of a user submitting an HTML form or clicking on an href. Currently, HTML browsers convert from Unicode to the document encoding when an HTML form is submitted or an href with a non-ASCII query part is clicked. However, the browsers use different syntax for characters outside the document's charset, depending on whether it was an HTML form or an href. I'm saying that it would be more consistent if the browsers used NCRs for both forms *and* hrefs, since the server doesn't know which one the user was dealing with. > The magic of RFC 3987 is > that it's straight forward. Admittedly I ignore "legacy IRIs" > (a few MAYs) and "IRI comparison" in RFC 3987. > > All query-part problems are not IRI-problems, they have to be > addressed elsewhere, not 3987bis, they already existed before. Maybe HTML forms and hrefs with query parts can be specified in HTML 5 instead of IRIbis. Erik
Received on Tuesday, 29 April 2008 15:25:09 UTC