Re: validator.nu from Henri Sivonen on 2008-02-18 (www-archive@w3.org from February 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 18 Feb 2008 11:11:17 +0200
To: "Frank Ellermann" <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
Cc: www-archive <www-archive@w3.org>
Message-Id: <A4A65CAA-FD80-473E-BABA-7ED07987F4C2@iki.fi>
Disclaimer: Still not a WG response.

Changed W3C list to www-archive, because this reply isn't feedback  
about HTML 5.

On Feb 17, 2008, at 23:08, Frank Ellermann wrote:

> Henri Sivonen wrote:
>
>> Validator.nu checks the combination of the protocol
>> entity body and the Content-Type header. Pretending
>> that Content-Type didn't matter wouldn't make sense
>> when it does make a difference in terms of processing
>> in a browser.
>
> I checked if the W3C validator servers still claim that
> application/xml-external-parsed-entity is chemical/x-pdb
>
> This was either fixed, or it is an intermittent problem,
> therefore I can continue my I18N tests today.

It was fixed.

> XHTML 1 like HTML 4 wants URIs in links.

HTML 4.01 already defined IRI-compatible processing for the path and  
query parts, so now that there are actual IRIs, making Validator.nu  
complain about them doesn't seem particularly productive.

> For experiments with
> IRIs I created a homebrewn XHTML 1 i18n document type.
>
> Actually the same syntax renaming URI to IRI everywhere,
> updating RFC 2396 + 3066 to 3987 + 4646 in DTD comments,

That's a pointless exercise, because neither browsers nor validators  
ascribe meaning to DTD comments or production identifiers.

> To get some results related to the *content* of my test
> files I have to set three options explicitly:
>
> * Be "lax" about HTTP content - whatever that is, XHTML 1
>  does not really say "anything goes", but validator.nu
>  apparently considers obscure "advocacy" pages instead
>  of the official XHTML 1 specification as "normative".

Validator.nu treats HTML 5 as normative and media type-based  
dispatching in browsers as congruent de facto guidance.

> With those three explicitly set options it could finally
> report that my test page is "valid" XHTML 1 transitional.
>
> But it's *not*, it uses real IRIs in places where only URIs
> are allowed, a major security flaw in DTD based validators:
> <http://omniplex.blogspot.com/2007/11/broken-validators.html>

I've fixed the schema preset labeling to say "+ IRI".

> | Warning: XML processors are required to support the UTF-8
> | and UTF-16 character encodings. The encoding was KOI8-R
> | instead, which is an incompatibility risk.
>
> Untested, I hope US-ASCII wouldn't trigger this warning, as
> a mobile-ok prototype did some months ago (and maybe still
> does).

US-ASCII and ISO-8859-1 (their preferred IANA names only) don't  
trigger that warning, because I don't have evidence of XML processors  
that didn't support those two in addition to the required encodings.

> Validator.nu accepts U-labels (UTF-8) in system identifiers,
> W3C validator doesn't, and I also think they aren't allowed
> in XML 1.0 (all editions).  Martin suggested they are okay,
> see <http://www.w3.org/Bugs/Public/show_bug.cgi?id=5279>.

Validator.nu URIfies system ids using the Jena IRI library set to the  
XML system id mode.

Considering that the XML spec clearly sought to allow IRIs ahead of  
the IRI spec, would it be actually helpful to change this even if a  
pedantic reading of specs suggested that the host part should be in  
Punycode?

> Validator.nu rejects percent encoded UTF-8 labels in system
> identifiers, like the W3C validator.  I think that is okay,
> *unless* you believe in a non-DNS STD 66 <reg-name>, where
> it might be syntactically okay.  Hard to decide, potentially
> a bug <http://www.w3.org/Bugs/Public/show_bug.cgi?id=5280>.

I don't believe in non-DNS host names.

> [back to the general "HTML5 considered hostile to users"]
>> What are you trying to achieve?
>
> As mentioned about ten times in this thread I typically try
> to validate content, as author of the relevant document, or
> in a position to edit (in)valid documents.

But why do you want to validate content only when the Content-Type  
matters on the Web and you seem to be hostile to the idea of fixing  
how your documents are served? What good does it do to serve XHTML  
with a custom DTD when real browsers don't read the DTD and don't even  
parse the document as XML?

> The complete number of HTTP servers under my control at this
> second (counting servers where I can edit dot-files used as
> configuration files by a popular server) is *zero*.  That is
> a perfectly normal scenario for many authors and editors.

These days, it is also a relatively easily fixable scenario. In  
particular, if you want to be in the business of creating test suites,  
getting hosting where you can tweak the Content-Type is generally a  
good way to start.

> Of course I'm not happy if files are served as chemical/x-pdb
> or similar crap, but it is outside my sphere of influence,

Fortunately, it turned out that is was within my sphere of influence:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5446

> and not what I'm interested in when I want to know what *I* did to
> make the overall picture worse *within* documents edited by me.

A validator can't know what parts you can edit and what parts you  
can't. However, if you care about practical stuff, you shouldn't even  
enable external entity loading, since browsers don't load external  
entities from the network. (That's why the option isn't the default on  
Validator.nu.)

>> Are you trying to check that your Web content doesn't have
>> obvious technical problems?
>
> Normally, yes.  Of course we are discussing mainly my validator
> torture test pages, intentionally *unnormal* pages.

Like I said above, I suggest getting better hosting if you want to  
host test suites.

>> Or are you just trying to game a tool to say that your page is
>> valid
>
> Rarely.  I use image-links hidden by span within pre on one page,
> at some point in time validators will tell me that this is a hack,
> no matter if it works with all browsers I've ever tested.  Sanity
> check with validator.nu:  Your tool says that this is an error.

Could you provide a URL to a demo page?

>> Why are you validating pages?
>
> To find bugs.

As far as bugs that affect practical Web usage go, all "bugs" related  
to loading external entities are irrelevant...

Thank you for the feedback.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Monday, 18 February 2008 09:11:42 UTC