Re: BiDi IRI deployment? from Frank Ellermann on 2008-04-26 (www-international@w3.org from April to June 2008)

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Sat, 26 Apr 2008 16:08:54 +0200
To: www-international@w3.org
Message-ID: <fuvcsg$vkh$1@ger.gmane.org>

Erik van der Poel wrote:

> Do you know of any user agents that process the IRI 
> differently, depending on the XHTML 1 claim?

No, but for some months now I use only two browsers,
both belonging to the "popular" class.  After the 
IE7-XP confusion in 2007 one validator catches broken
URIs (including raw IRIs), for validator.w3.org it is
still only a reported bug.

AFAIK - I never did more than participate in public
beta-tests of this validator, maybe it's fixed, and
then I'd be curious how (with a DTD based validator).

>> And "raw" UTF-8 IRIs are boring, popular browsers
>> get this right - "raw" IRIs in legacy charsets are
>> more interesting.

> I agree that those are more interesting. The major
> browsers are slowly converging on a set of conventions
> in this area.

FF2 hated the simple <ipath> in this case (covered by
Martin's test suite), but got a KOI8-R <ihost> right.

I didn't test BiDi scripts, can't read them and would
miss obvious bugs, besides the IDNAbis folks are about
to fix various issues wrt BiDi.

> Host name: Content developers still use Punycode 
> because MSIE 6 does not support IDNA.

There is a plugin for IE6, link offered on ICANN's IDN
Wiki, I did not test it so far.  Actually it would be 
strange to use "raw" IRIs when this (1) is invalid for 
relevant (X)HTML versions, and therefore by definition
missing the entry condition for many accesibility tests,
(2) it really isn't accessible with older browsers, IE6
is by far not the oldest browser, (3) IRIs are designed
to have an equivalent URI, (4) IRI producers can handle
raw IRIs, therefore they can as well "URIfy" them for
URI consumers not supporting "raw" IRIs, and (5) nobody
bothered to specify "XHTML I18n", it should be trivial.

For old software, some browsers, limited devices, curl,
wget, whatever, it will take years until this all ended
up in a museum, today it's still obscure to expect that
URI consumers - the weaker part - support raw IRIs with
their Unicode 3.2 IDN punycode obsctacles for <ihost>.

Anything else in "raw" IRIs is straight forward, even 
in legacy charsets, but the <ihost> part isn't trivial.

> Path: Firefox has agreed to convert raw paths to 
> escaped UTF-8, starting with Firefox 3.

It should, getting the <ihost> right (in FF2), but not 
<ipath> (for legacy charsets), must be a bug.  A bit
like "can do integrals, but can't do sums"... :-)

 Frank

Received on Saturday, 26 April 2008 14:07:05 UTC