W3C home > Mailing lists > Public > public-qa-dev@w3.org > September 2004

Re: link checker and IRIs

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sat, 04 Sep 2004 16:45:03 +0200
To: Martin Duerst <duerst@w3.org>
Cc: public-qa-dev@w3.org
Message-ID: <4141c5a4.56780365@smtp.bjoern.hoehrmann.de>

* Martin Duerst wrote:
>[http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1]
>
>I'm unhappy with 'only a few user agents implement', because I don't
>think it's true. But it's probably a question of whether you think
>the glass is half empty or half full.

Setting up a proper test suite and publishing test results would be
helpful to make statements to this effect, but I am afraid it would
demonstrate what I said; worse if you extend the test suite to cover
other products like XML processors or SVG implementations for which
the specifications suggest or require similar behavior. Specifically
user agents like search engine robots would be very interesting as I
did not test them much and expect them to fail a lot.

>>But that would not conform to the IRI internet draft.
>
>Were/how/why?

I am not sure what you are asking for here. If the IRI draft requires
NFC normalization and the processor does not do it then it is non-
conforming.

>I think Tidy can do some really good work. It can offer an option
>to convert from e.g. Latin-1 to the corresponding %-escaping for
>those cases that can't fix their servers' serving stuff with
>Latin-1 paths. It can offer an option to downgrade to %-escaping
>using UTF-8 for use on old browsers. It can offer an option for
>converting IDNs to punycode for use on old browsers. But all
>these options should be off by default. We shouldn't make it
>more difficult than necessary for people to move in the right
>direction.

It seems we disagree about what would be the right direction here.
Emitting illegal markup is not something Tidy should do by default
if that can be avoided. http://sourceforge.net/projects/tidy can
be used to report bugs or file feature requests, responding to one
of the mails where I have asked for your feedback on how Tidy should
handle http://lists.w3.org/Archives/Public/uri/2003May/0008.html
relevant cases might also work.

>>if we update the Markup Validator later this year to do the same,
>
>I would not want that to happen. For one, these attributes are
>CDATA.

As far as I can tell there is consensus among QaDev participants to
implement conformance checks beyond what DTDs currently allow us to
do and as far as I am concerned there won't be consensus to implement
these checks based on how much we like the conformance requirements;
you would need to talk to the HTML Working Group to publish normative
corrections for their documents to allow the things you don't want the
Validator to complain about.

>[on "tests"]

All I said is that it is inappropriate for a test to suggest
that behavior is non-conforming if it is not non-conforming.

>They don't activate the "URIs as UTF-8" option for East Asian
>versions of IE, as far as I understand.

And the query part is no longer converted to UTF-8 %xx escapes
for all versions since doing that breaks a lot of existing web
pages. For http://www.example.org/? in an ISO-8859-1 encoded
document Internet Explorer/Windows 5+ will request /%C3%B6?%F6.
Received on Saturday, 4 September 2004 14:45:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 19 August 2010 18:12:44 GMT