Re: including a schema with "HTML: The Markup Language" Clarifying TAG Re: Courtesy notification

Maciej Stachowiak wrote:
> 
> On Mar 16, 2010, at 3:25 PM, Larry Masinter wrote:
> 
>>> none of the available schema languages is
>>> expressive enough to represent all of the HTML5 document conformance
>>> requirements.
>>
>> This seems like an odd requirement.
>>
>> Can you think of any non-trivial computer language for which there
>> a formalism such as a schema language or BNF or whatever completely
>> described ALL of the conformance requirements for instances of
>> that language? In the history of computer languages?
>>
>> I can't.
> 
> Most programming languages are not specified in terms of a schema. They 
> do often provide a grammar in BNF form, but this is generally seen as an 
> aid to implementors in determining how to parse the language, not a tool 
> for conformance checking. To use an example I am familiar with, C has 
> many mandatory diagnostics which do not comprise part of the grammar, 
> and I do not think it is common to check correctness of C programs with 
> a tool that solely checks against the grammar.

I find this a surprising view of the history of programming languages.  Many of 
the languages I've worked with have formal grammars that are/were commonly used 
to define the basis of parsing programs in that language.  Much of the work on 
programming language theory through the 1970s was exactly about developing 
systems to check the syntax of programming languages against formal 
specifications, and was largely successful in achieving those goals.  Later work 
on semantics conformance checking was harder, but not without some limited success.

And if parsing a programming language isn't conformance checking, I don't know 
what is.  And if a formal grammar isn't a kind of schema, then what is it?

What continues to amaze me is that later work on markup languages and other 
network protocol syntaxes seems to have completely ignored the earlier work on 
programming language parsing.  (XML is a case in point: it defies parsing 
according to established programming language compilation techniques, in part 
because its lexing is parse-context-dependent.  HTML even more so, I think, 
though I've never tried to write a parser for that.)

#g
--

> For markup languages, however, there is a long history of conformance 
> checkers that do nothing but check against a machine readable formalism 
> (DTD or schema), and then claim a document is "valid" based solely on 
> these checks. Likewise, tools sometimes assume that any content they 
> produce which matches the DTD or schema is valid. I think that's the 
> basis for Mike's worry that providing a schema, even an informative one, 
> may lead people astray.
> 
> (Personally I think the risk of that happening for validators is low; 
> the developers working on the one currently available HTML5 validator 
> are very much aware of this issue, and I have not seen any interest in 
> building one that relies solely on a scheme. For tools that generate 
> content, it's hard for me to say whether any would mistakenly assume 
> correctness based solely on the schema.)
> 
> Incidentally, although this isn't usually done as part of a standards 
> document, it would certainly be possible to fully describe all 
> machine-checkable conformance requirements for HTML5 in a 
> machine-readable formalism if we really wanted to, by choosing or 
> inventing a sufficiently powerful formalism. validator.nu can be seen as 
> an attempt to do this, using RelaxNG + Schematron + Java as the 
> formalism. In principle we could apply some forms of analysis more 
> easily to the code of validator.nu than we could to the spec itself.
> 
> Regards,
> Maciej
> 
> 

Received on Wednesday, 17 March 2010 09:41:38 UTC