W3C home > Mailing lists > Public > www-tag@w3.org > March 2010

Re: including a schema with "HTML: The Markup Language" Clarifying TAG Re: Courtesy notification

From: Maciej Stachowiak <mjs@apple.com>
Date: Tue, 16 Mar 2010 15:39:54 -0700
Cc: 'Dan Connolly' <connolly@w3.org>, "'Michael(tm) Smith'" <mike@w3.org>, noah_mendelsohn@us.ibm.com, 'Paul Cotton' <paul.cotton@microsoft.com>, 'Philippe Le Hegaret' <plh@w3.org>, 'Sam Ruby' <rubys@intertwingly.net>, www-tag@w3.org
Message-id: <B3924699-260E-46BF-99CF-BE90B860258C@apple.com>
To: Larry Masinter <LMM@acm.org>

On Mar 16, 2010, at 3:25 PM, Larry Masinter wrote:

>> none of the available schema languages is
>> expressive enough to represent all of the HTML5 document conformance
>> requirements.
>
> This seems like an odd requirement.
>
> Can you think of any non-trivial computer language for which there
> a formalism such as a schema language or BNF or whatever completely
> described ALL of the conformance requirements for instances of
> that language? In the history of computer languages?
>
> I can't.

Most programming languages are not specified in terms of a schema.  
They do often provide a grammar in BNF form, but this is generally  
seen as an aid to implementors in determining how to parse the  
language, not a tool for conformance checking. To use an example I am  
familiar with, C has many mandatory diagnostics which do not comprise  
part of the grammar, and I do not think it is common to check  
correctness of C programs with a tool that solely checks against the  
grammar.

For markup languages, however, there is a long history of conformance  
checkers that do nothing but check against a machine readable  
formalism (DTD or schema), and then claim a document is "valid" based  
solely on these checks. Likewise, tools sometimes assume that any  
content they produce which matches the DTD or schema is valid. I think  
that's the basis for Mike's worry that providing a schema, even an  
informative one, may lead people astray.

(Personally I think the risk of that happening for validators is low;  
the developers working on the one currently available HTML5 validator  
are very much aware of this issue, and I have not seen any interest in  
building one that relies solely on a scheme. For tools that generate  
content, it's hard for me to say whether any would mistakenly assume  
correctness based solely on the schema.)

Incidentally, although this isn't usually done as part of a standards  
document, it would certainly be possible to fully describe all machine- 
checkable conformance requirements for HTML5 in a machine-readable  
formalism if we really wanted to, by choosing or inventing a  
sufficiently powerful formalism. validator.nu can be seen as an  
attempt to do this, using RelaxNG + Schematron + Java as the  
formalism. In principle we could apply some forms of analysis more  
easily to the code of validator.nu than we could to the spec itself.

Regards,
Maciej
Received on Tuesday, 16 March 2010 22:40:57 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:48:20 GMT