Re: validator.nu from Frank Ellermann on 2008-02-20 (www-validator@w3.org from February 2008)

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Wed, 20 Feb 2008 23:57:51 +0100
To: www-validator@w3.org
Message-ID: <fpib6g$c45$1@ger.gmane.org>
Henri Sivonen wrote:

> As Anne already pointed out to you, the disclaimers are there
> due to a request from one of the WG Chairs

Hopefully we are now far enough away from the HTML5 WG Chairs.

> I don't expect readers of www-validator to like off-topic 
> discussions about other validation services

Comparing the W3C validator with other validators is a tradition
here, and a thread mentioning three W3C validator bugs by number
should be fine.

>> In other words raw IRIs do *not* work on almost all browsers.
 
> Do they not work in the latest versions of, say, the top four
> browsers  (IE7, Firefox 3, Safari 3 and Opera 9.5)?

No idea, I don't use these browsers.  I downloaded Opera 9.5,
and maybe I get around to install and test it.  With FF2 pages
in Martin's test suite fail for <ipath> in legacy charsets, in
my test the more complex <ihost> worked.  Unfortunately IE6 is
a complete failure.
 
> I've noticed that you've used a legacy browser on a legacy
> OS until recently

Netscape 3 aka "mozilla 3" on OS/2 warp 3 connect, yes.  The
last update was for Y2K issues, clearly "raw" IRIs defined in
RFC 3987 published 2005 can't work with older browsers.  IE6
on W2K is not better wrt "raw" IRIs.

> I'm not interested in supporting authoring of new pages for
> the deep legacy.

<recycle> If you are not interested to validate XHTML 1 don't
 pretend that you can do it - truth in advertising. </recycle>

"Upgrade your browser" is no option for some users.  It will
take years until HTML5 is ready, and after that it will take
more years until pre-HTML5 browsers are irrelevant.  Assuming
breakneck speed.

> I think the premise of defining whether something is accessible
> should be whether a person with a given disability can actually
> access the page with reasonable effort using the kind of tools
> that are commonly used by persons with the disability.

It makes sense to limit this effort to "syntactically valid".

AFAIK that is what these definitions do.  As far as it means
"visible with any browser is irrelevant, it has to be strict"
I ignore it using "transitional".  Of course I can't claim to 
have AAA or mobile-ok pages when that's not true - the "truth 
in advertising" theme again - but I could have said "tested
with Lynx" for most of my pages, it might be more important.

> Validator.nu is not a legal tool.

Sure, I wanted to point out that say governmental pages cannot
simply decide that "works with Lynx" is good enough for their
own definition of "accessible".  Unfortunately I have no Lynx
at the moment, maybe it can do "raw" IRIs.  For issues related
to TLS and Unicode Lynx used to be smarter than "mozilla 3".

>> maybe try to create a HTML5 DTD starting with the existing
>> XHTML 1.1 modules.
 
> That doesn't make sense.

Why not ?  There are good tools for DTDs, and one is the W3C
validator.  A second opinion wrt validation is often important,
if HTML5 depends on your tool alone that could be problematic.

> If IRIs are bad, should HTML5 require plain URIs?

They are cute, one of their best features is that they can be
transformed into equivalent URIs for backwards compatibility.

I asked AvK to add IRIs to the "HTML5 diff" draft, I did not
propose to remove them from HTML5.  HTML5 is for new browsers
(in some years), if authors decide to use HTML5 they are free
to use all new features.   

OTOH if authors decide that backwards compatibility is more
important until the last IE6 found its place in a museum they
can use XHTML 1 or HTML 4, and the URI-form of IRIs.

Disclaimer:  I'm not talking about 3987bis "LEIRIs", I hope
these "legacy enhancements" are moved to a separate draft
with intended status HISTORIC, where they cannot spoil IRIs.

> Should the IETF close down IRI activities or say that IRIs
> are only for typing into the browser address field?

Public-iri is a W3C list, and 3987bis is an individual draft,
the IETF cannot and does not censor what individual authors
do.  Nothing is wrong with IRIs from my POV (ignoring LEIRIs),
if voluntary deployment where permitted starts at the side of
the stronger parties (URI producers + servers), a bit like
MIME, *THE* showcase wrt backwards compatibility.

Take xmpp: URIs as example, clients don't need to know what
IDNA is, the jabber server offers the required magic.  Never
break things unless you really must, and then do it as clear
as possible.  I hate those "embrace, extend, and extinguish"
schemes, for HTML5 as 4+1 I'm not yet sure what it will be.

> So things become more or less accessible if you change human-
> readable comments in a DTD?

"Valid" is required for various reasons, of course IE6 would 
also fail on an invalid version of my test page.  The issue
here are validators unable to figure out what valid is, like
your validator saying "valid" for the wrong reasons.  Or the
W3C validator saying "valid" for different wrong reasons.

> One might also say that IRIs must be supported for
> internationalization

I'd say such folks missed the most important obvious point in
the design, all IRIs have an equivalent URI.  For a dubious
analogy, what you see in IRIs is like UTF-16, what I see is
like UTF-8.  A major difference wrt backwards compatibility.

 [implementation and interop report using invalid test pages] 
> Just by doing it regardless of what an "XHTML document type"
> is claimed to permit?

That would fail in a "giggle test", as far as I'm concerned.
 
> <embed> is valid in HTML5, yes.

LOL, I didn't know that.  Maybe a good idea.  Be careful with
<nobr>, if that's at the moment also "valid HTML5".  

> Referring to the standard family that defines ISO RELAX NG
> (and ISO Schematron and NVDL) is not inappropriate, in my
> opinion, even if you haven't heard of the standard family
> before.

ACK, if they don't say "congruent de facto guidance", this 
triggered my <http://en.wikipedia.org/wiki/WP:AWW> alert ;->

> The warning is about actual potential interoperability issues.
> There's a very popular XML parser--expat--that supports only
> UTF-8, UTF-16, ISO-8859-1 and US-ASCII in its default 
> configuration. However, no one has shown me an XML parser
> that didn't support ISO-8859-1

Based on the XML spec. the minimum is clear, arguably (= IMO)
covering US-ASCII.  If you don't warn about Latin-1 I wonder
why similar iso-8859-* charsets are "discriminated", but it's
a matter of taste.  I have no problem with the KOI8-R warning,
I was only curious what you'd do with US-ASCII.

> For text/html, ISO-8859-1 must be treated as an alias for 
> Windows-1252 in order not to Break the Web.

The set of features I'd like in HTML5 is certainly not empty.

> The XML side of Validator.nu warns about C1 controls.

Just to be sure, also 0x85 in Latin-1, or generally u+0085 ?  

 [content + header vs. content] 
> The check is optional by on by default.

Better make it opt in, not opt out.  XHTML is complex enough,
the fine art of HTTP is beyond what most of your users can
do or are interested in.  My crystal ball says.

> Ordinary users don't publish test suites.

Some users are vigilant when new and incompatible features
are silently introduced, "BTW, what you know as URL is now
IRI, upgrade your browser, it's your fault, read RFC 3987".

It is not my fault, I have read RFC 3987, it does not say 
"updates 3986", Martin confirmed it here.  ICANN fixed the
IDN test pages, the XHTML modularizaton folks fixed their
draft, the validome folks announced to get it right, they
already support no ASCII => no URI, AvK promised to mention
IRIs and empty language tags in the "HTML5 diff", hopefully
Mediawiki and FF2 will be (or maybe already are) also fixed.

So far for introducing IRIs silently through the backdoor,
I'm all for doing it upfront.  
 
 Frank
Received on Wednesday, 20 February 2008 22:56:56 UTC