W3C home > Mailing lists > Public > public-html@w3.org > April 2007

Re: Formal definition of HTML5 (was Re: Version information)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Sun, 15 Apr 2007 12:45:29 +0300
Message-Id: <A22963B0-0BDF-4701-A0C6-BFA8051B9188@iki.fi>
Cc: public-html@w3.org
To: Henrik Dvergsdal <henrik.dvergsdal@hibo.no>

On Apr 15, 2007, at 11:39, Henrik Dvergsdal wrote:

> Do you think the full set of machine-checkable criteria in  XHTML5  
> could be specified by means of a RELAX NG schema with references to  
> a datatype library and a set of formal rules, specifying, for  
> instance, table integrity requirements?

The table integrity requirements, the Charmod stuff, etc., etc., are  
out of the reach of the existing XML constraint formalisms, so the  
only kind of machine-readable "formalism" that is currently supported  
by real-world tools is a computer program written in a well-specified  
programming language.

On the other hand, if you're not asking for a formalism that can be  
checked against by existing computer-based tooling (schema-based  
validators, JVMs, Python interpreters, etc.) and are instead asking  
for mathematician-readable notation that has an air of formality  
compared to English sentences, I guess it would be possible to  
substitute some English with mathematical symbols but it wouldn't  
improve (on the aggregate) the readability of the spec among those  
who need to read it.

Furthermore, chances are that the conceptual model (appropriate for  
the spec) and the sane data structures for a given application are  
different. So different that if you had a formal language for  
expressing constraints on the conceptual model, you couldn't trust an  
inference engine to compile the constraints into a program that is as  
space and time-efficient as a human-written program. For example, the  
conceptual model for tables involves a grid of slots. Yet, it would  
be an extremely bad idea for a table integrity checker to allocate  
(colspan * rowspan * pointerSize) amount of memory when encountering  
<td colspan='...' rowspan='...'>. (Why? Denial of service attacks  
based on causing huge memory allocation requests.) I reasoned that  
the HTML table slot allocation policy has certain properties and  
consequences, which helped me come up with efficient data structures  
and algorithms but I never drafted any formal proof that these  
properties and consequences hold. In general, you don't prove your  
programs correct using formal methods unless you are Knuth or unless  
you are developing a control system for a rocket.

That said, it seems that there are existing ways to map XML documents  
into Prolog terms, so trying to express the conformance requirements  
for XHTML5 as Prolog rules would be an interesting exercise (but not  
one that I'd spend my time on). Even if you were able to successfully  
complete the exercise, I seriously doubt that a Prolog system would  
produce useful English explanations (understandable and relevant to  
authors) *why* a given document failed to adhere to the rules.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Sunday, 15 April 2007 09:46:24 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:42 UTC