Re: BiDi IRI deployment? from Erik van der Poel on 2008-04-29 (www-international@w3.org from April to June 2008)

From: Erik van der Poel <erikv@google.com>
Date: Tue, 29 Apr 2008 08:24:26 -0700
To: "Frank Ellermann" <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
Cc: www-international@w3.org
Message-ID: <c07a32650804290824l315dd25uc19355f3b6b3862c@mail.gmail.com>

On Mon, Apr 28, 2008 at 7:54 PM, Frank Ellermann
<nobody@xyzzy.claranet.de> wrote:
>  IIRC RFC 3987 has no special rules for anything excl. <ihost>,
>  it's always "transform legacy charset to UTF-8 and then percent-
>  encode" to get the equivalent URI.
>
>  Any magic with say iri= parameters in an <iquery> happens on the
>  server, servers like IRI producers are supposed to know how they
>  can handle any IRI in its URI-equivalent form.

RFC 3987 does mention related issues. E.g., section 7.8:

   "Likewise, when a new Web form is set up using UTF-8 as the character
   encoding of the form page, the returned query URIs will use UTF-8 as
   the character encoding (unless the user, for whatever reason, changes
   the character encoding) and will therefore be compatible with IRIs."

Section 7.7:

   "Second, it may include URIs constructed based on character encodings
   other than UTF-8.  These URIs may be produced by user agents that do
   not conform to this specification and that use legacy character
   encodings to convert non-ASCII characters to URIs."

HTML browsers are an example of "user agents that do not conform to
this specification".

>  > The &#NNNNN; syntax has the advantage that it is consistent
>  > with de facto HTML form handling. (The server does not know
>  > whether the client started with an HTML form or an href.)
>
>  ACK, I normally prefer US-ASCII with NCRs for very limited uses
>  of non-ASCII, but that is only because I rarely need non-ASCII,
>  no option in most languages.

I'm not sure whether we are communicating here. I'm talking about URIs
that are sent from the client to a server, whether that is the result
of a user submitting an HTML form or clicking on an href. Currently,
HTML browsers convert from Unicode to the document encoding when an
HTML form is submitted or an href with a non-ASCII query part is
clicked. However, the browsers use different syntax for characters
outside the document's charset, depending on whether it was an HTML
form or an href. I'm saying that it would be more consistent if the
browsers used NCRs for both forms *and* hrefs, since the server
doesn't know which one the user was dealing with.

>  The magic of RFC 3987 is
>  that it's straight forward.  Admittedly I ignore "legacy IRIs"
>  (a few MAYs) and "IRI comparison" in RFC 3987.
>
>  All query-part problems are not IRI-problems, they have to be
>  addressed elsewhere, not 3987bis, they already existed before.

Maybe HTML forms and hrefs with query parts can be specified in HTML 5
instead of IRIbis.

Erik

Received on Tuesday, 29 April 2008 15:25:09 UTC