Re: Making the HTML language self-describing

Ian Hickson wrote:
> On Wed, 7 Jan 2009, Martin Atkins wrote:

>> 3. Write a generic parser that can be used to parse HTML markup of any 
>> version (>= 5) into a DOM.
> 
> I don't think we'll ever be able to do this. For example, there is no way 
> I could have predicted how we were going to add <ruby> parsing to the spec 
> before I added it. This would be possible if we could guarantee that for 
> all time, all new inventions would always be done in a regular way, but 
> history has shown that we would be naive to assume this.
> 

CSS used to have display-model[1] attribute:

display-model: inline-inside | block-inside | table | ruby

While it cannot be used in CSS associated with particular page but it is 
possible to use it in so called default or master style sheets to define
rendering and parsing model of generic html alike grammar.

It is possible to define parsers for html versions 3.2, 4 and 5 by using 
following attributes:

display-model: inline-inside | block-inside | table | ruby;
parsing-model: empty | mixed | pre;
can-contain: <list of element types>;
cannot-contain: <list of element types>;

that can be used by some generic HTML parser (accepting subset of SGML).

E.g.

img
{
   display: inline-block;
   parsing-model: empty;
   foreground-image: attr(src);
}

select
{
   display: inline-block;
   display-model: block-inside;
   parsing-model: closed;
   can-contain: option optgroup;
   /* ... other primordial styles for the element ... */
   background-image: url(system-shape:select);
   ...
}

option
{
   display: block;
   parsing-model: mixed;
   cannot-contain: *; /* cannot contain any sub elements - only text */
   ...
   color: windowtext;
}

option:selected { ... }

etc.

I use parser that is based on similar table of declarations and
have strong feeling that it is possible to define html5 parser in these 
terms. Thus to have table driven declaration (something close to DTD)
that define html4,5, etc.

I would even give up some too smart error handling of html5 in favor to 
be able to define various validating, processing and data mining tools 
by using generic configurable parsers.

[1] http://www.w3.org/TR/2002/WD-css3-box-20021024/#L706

-- 
Andrew Fedoniouk.

http://terrainformatica.com

Received on Thursday, 8 January 2009 05:05:09 UTC