Re: Schemas and validation from Joe D Williams on 2010-03-02 (public-html@w3.org from March 2010)

From: Joe D Williams <joedwil@earthlink.net>
Date: Mon, 1 Mar 2010 23:13:12 -0800
To: "Jonas Sicking" <jonas@sicking.cc>
Cc: "Maciej Stachowiak" <mjs@apple.com>, "Henri Sivonen" <hsivonen@iki.fi>, "Leonard Rosenthol" <lrosenth@adobe.com>, "Anne van Kesteren" <annevk@opera.com>, "Larry Masinter" <LMM@acm.org>, "Toby Inkster" <tai@g5n.co.uk>, "Adam Barth" <w3c@adambarth.com>, "HTML WG" <public-html@w3.org>
Message-ID: <94C7250F330A450AB967C98C8A7BDCD7@joe1446a4150a8>

----- Original Message ----- 
From: "Jonas Sicking" <jonas@sicking.cc>
To: "Joe D Williams" <joedwil@earthlink.net>
Cc: "Maciej Stachowiak" <mjs@apple.com>; "Henri Sivonen" 
<hsivonen@iki.fi>; "Leonard Rosenthol" <lrosenth@adobe.com>; "Anne van 
Kesteren" <annevk@opera.com>; "Larry Masinter" <LMM@acm.org>; "Toby 
Inkster" <tai@g5n.co.uk>; "Adam Barth" <w3c@adambarth.com>; "HTML WG" 
<public-html@w3.org>
Sent: Monday, March 01, 2010 5:30 PM
Subject: Re: Schemas and validation

> On Mon, Mar 1, 2010 at 3:10 PM, Joe D Williams 
> <joedwil@earthlink.net> wrote:
>> I believe that the browser could run something
>> just fine that would not pass validation, but if valid, it should 
>> at least
>> run.
>
Jonas > Note that schemas, and indeed validation, is a poor way to 
test if
> something runs "just fine" in a browser. Every document runs in a
> browser, and there is defined behavior for essentially every 
> document
> (subject to hardware limitations, such as network speed and 
> available
> memory). I.e. once browsers correctly implement HTML5, they should 
> in
> general all behave the same way for a document, even if that 
> document
> is valid.
>
> But on the flip side, just because something validates doesn't mean
> that it'll do what you expect it to do. For example nothing about 
> the
> code in scripts gets any testing by a validator. But many other 
> things
> will validate fine, but not actually work the way you probably want
> them to. Consider for example:
>
> <a href="www.w3.org">W3C Home Page</a>
>
> This will not link to "http://www.w3.org" as you likely intended, 
> but
> no HTML5 validator, or schema validator, is going to signal that as 
> an
> error.

I think that is a good example, but I also think it depends upon how 
hard you want to work on something like that to make sure that in 
authortime it at least has all necessary parts of the URL so simple 
author errors like leaving off the http//: might be caught. But I 
agree usually that is too much to invest. But if the </a> was missing, 
surely a simple validation would at least flag that.

As Maciej says later in this thread, authortime validation is no 
substitute for actual runtime testing because the schema validation 
can only check structures and some content details. However, in 
practice, a decent validator should tell you whether or not it is time 
to actually test, and to help find the source of some types of runtime 
problems. But HTML5 as text/html is an unusual case because it seems 
like there are many exceptions or special treatments so a document 
with all shortcuts and with expected fixups to occur surely can only 
be tested by feeding it to a browser and examining whatever pixels and 
interactions you get. Also, runtime testing can be very subjective 
because all the guidance you can get is from the standard and its 
descriptions of what is supposed to happen and you gotta compare 
various browser presentations to decide. Whereas, in authortime a 
schema validation ought to be able to tell you that structures are as 
defined and that checkable attribute and content values match the 
model.

I'm not saying all important stuffs can be validated using schema 
because there are too many little details like the incomplete URL that 
can't be validated until the link is actually clicked, and it just 
isn't worth the time to do a complete analysis of all syntax varients 
during validation. And, of course we can't validate a script that has 
not produced its output.

There are many important details that can be validated and I think it 
would be only half done if we did not make a sincere attempt to 
produce a complete as possible standards-track XML schema that covers 
the entire vocabulary even if it could only meaningfully be used for 
the xhtml document. Other sorts of validators may follow and allow the 
parsing and fixup steps allowed in text/html but to me, that is an 
entirely different gorilla than schema validation of 
application/xhtml+xml.

Thanks and Best Regards,
Joe

Received on Tuesday, 2 March 2010 07:15:39 UTC