W3C home > Mailing lists > Public > www-validator@w3.org > October 2007

Re: IRIs in href

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Wed, 31 Oct 2007 11:37:43 +0100
To: www-validator@w3.org
Message-ID: <fg9m32$toh$1@ger.gmane.org>

olivier Thereaux wrote:

> Now, if today the HTML 4.01 and XHTML 1.0 specs and above were  
> updated to say "IRIs" instead of "URIs", what would you do?

Maybe ditch the W3C and post the reasons in an Internet Draft.
I'd certainly consider it as unethical.

RFC 3987 does not "update" 3986.  The spec.s should be updated
with s/2396/3986/g, s/3066/4646/g, and similar clerical tasks,
e.g. explaining why xml:lang is forced to be still an NMTOKEN
wrt these document types.

But for incompatible modifications we need new document types.

Not worldwide "upgrade your browser" campaigns, some users 
can't, and besides it's completely unnecessary, all IRIs by
definition have an equivalent URI working with "any browser".

> Saying that IRIs should not be used because they break in
> legacy software, is an argument I have sympathy for, but
> have trouble accepting.

I'm not suprised if folks active in the W3C don't care much
about "backwards compatibility".  But admittedly I was very
suprised when you introduced "let's not care about formally
valid" as new concept.  A user armed with an old text mode
browser could take out the ICANN IDN test, AFAIK "formally
invalid" is a FAIL in any accesibility test, isn't it ?

> This reminds me of the situation whereby, in Japan, one  
> still can't safely use unicode in mails, because so many
> MUAs or webmails just don't support it.

Maybe they have plausible reasons why they don't need or
don't like it.  BTW, the (formally valid) IDN test page
I've created last week was the first XHTML page where I
actually needed UTF-8.  Now I'm curious what browsers do
with an IRI in a legacy charset.  RFC 3987 allows this.

>> Sooner or later validators will be fixed to validate
>> URIs, what with all those "URI exploits" we've seen in
>> the last weeks for XP after the installation of IE7.
 
> This is irrelevant to the discussion about IRIs. Please
> don't use internationalization as a scapegoat for bad
> coding.

It's relevant for the discussion of bug 4916 submitted 
by you 2007-08-07.  If that bug is fixed it might also
detect IRIs where only URIs are allowed.

Admittedly almost impossible for a validator based on
DTDs, maybe you end up with a clumsy hack working only
for a few very important document types.  

>> I can still tell you the day when the W3C validator
>> started to flag &#128; as invalid on a windows-1252
>> page. I was working on this page, it was stunning.
 
> There once was a bug, and IIRC it was fixed in a few
> hours.

Two days after 911, it's good if it only took you a few
hours.  But it took me several months to figure out why
I need octet 128 instead of NCR &#128;.

> Now, how is that relevant to the discussion at hand?

If everybody and his dog start to use IRIs in document
types where it's not permitted, and some time later an
improved validator informs them that this was invalid,
the disturbed users will be annoyed.

> The current XHTML DTD says that DTDs are CDATA

Sure, the details are specified in the prose, the DTD
only uses an entity name %URI;  It could also use %FOO;
or %IRI; as name.  Likewise the RFC 2396 in the DTD is 
only a comment.  

 Frank
Received on Wednesday, 31 October 2007 10:40:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:26 GMT