W3C home > Mailing lists > Public > www-international@w3.org > April to June 2008

Re: BiDi IRI deployment?

From: Erik van der Poel <erikv@google.com>
Date: Mon, 28 Apr 2008 10:21:40 -0700
Message-ID: <c07a32650804281021l10ee1407re5d91006dd85e488@mail.gmail.com>
To: "Frank Ellermann" <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
Cc: www-international@w3.org

Frank,

I agree with you that there are currently still several reasons to use
URIs (not IRIs) in HTML. Time will tell whether IRIs will be used more
in the future, e.g. when old user agents are no longer used widely.

Yes, IDNAbis is working on bidi, and I hope that some of that work
will be applicable to IRIbis.

I notice that you did not address the query part in your response. I
believe this is one of the more interesting areas, in terms of trying
to get the developers to move in the same direction. Since URIs and
IRIs do not have the "accept-charset" that HTML forms have, the "best
practice" would be to use a charset that can encode all of Unicode
(e.g. UTF-8). However, when the server does not use such a charset, it
would be nice if the clients would use the same convention for
characters outside the charset. The &#NNNNN; syntax has the advantage
that it is consistent with de facto HTML form handling. (The server
does not know whether the client started with an HTML form or an
href.)

As specs evolve, some parts become descriptive (since they describe
practices that have settled down and are unlikely to change), and some
parts remain prescriptive (since experts believe that those are the
right directions). Within the IRI space, at least for HTML, I believe
we have reached the point where the current practice of converting the
query part to the encoding of the document (with some exceptions) has
settled down, and we can make that part of the spec descriptive.

The IRIbis author(s) may wish to make this part optional (e.g. a
profile), so that applications other than HTML can still opt for the
"clean" solution (query part in escaped UTF-8).

Erik

On Sat, Apr 26, 2008 at 7:08 AM, Frank Ellermann
<nobody@xyzzy.claranet.de> wrote:
>
>  Erik van der Poel wrote:
>
>
> > Do you know of any user agents that process the IRI
>  > differently, depending on the XHTML 1 claim?
>
>  No, but for some months now I use only two browsers,
>  both belonging to the "popular" class.  After the
>  IE7-XP confusion in 2007 one validator catches broken
>  URIs (including raw IRIs), for validator.w3.org it is
>  still only a reported bug.
>
>  AFAIK - I never did more than participate in public
>  beta-tests of this validator, maybe it's fixed, and
>  then I'd be curious how (with a DTD based validator).
>
>
>  >> And "raw" UTF-8 IRIs are boring, popular browsers
>  >> get this right - "raw" IRIs in legacy charsets are
>  >> more interesting.
>
>  > I agree that those are more interesting. The major
>  > browsers are slowly converging on a set of conventions
>  > in this area.
>
>  FF2 hated the simple <ipath> in this case (covered by
>  Martin's test suite), but got a KOI8-R <ihost> right.
>
>  I didn't test BiDi scripts, can't read them and would
>  miss obvious bugs, besides the IDNAbis folks are about
>  to fix various issues wrt BiDi.
>
>
>  > Host name: Content developers still use Punycode
>  > because MSIE 6 does not support IDNA.
>
>  There is a plugin for IE6, link offered on ICANN's IDN
>  Wiki, I did not test it so far.  Actually it would be
>  strange to use "raw" IRIs when this (1) is invalid for
>  relevant (X)HTML versions, and therefore by definition
>  missing the entry condition for many accesibility tests,
>  (2) it really isn't accessible with older browsers, IE6
>  is by far not the oldest browser, (3) IRIs are designed
>  to have an equivalent URI, (4) IRI producers can handle
>  raw IRIs, therefore they can as well "URIfy" them for
>  URI consumers not supporting "raw" IRIs, and (5) nobody
>  bothered to specify "XHTML I18n", it should be trivial.
>
>  For old software, some browsers, limited devices, curl,
>  wget, whatever, it will take years until this all ended
>  up in a museum, today it's still obscure to expect that
>  URI consumers - the weaker part - support raw IRIs with
>  their Unicode 3.2 IDN punycode obsctacles for <ihost>.
>
>  Anything else in "raw" IRIs is straight forward, even
>  in legacy charsets, but the <ihost> part isn't trivial.
>
>
>  > Path: Firefox has agreed to convert raw paths to
>  > escaped UTF-8, starting with Firefox 3.
>
>  It should, getting the <ihost> right (in FF2), but not
>  <ipath> (for legacy charsets), must be a bug.  A bit
>  like "can do integrals, but can't do sums"... :-)
>
>   Frank
>
>
>
Received on Monday, 28 April 2008 17:22:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:17 GMT