W3C home > Mailing lists > Public > public-html@w3.org > November 2008

Re: An HTML language specification vs. a browser specification

From: Robert J Burns <rob@robburns.com>
Date: Fri, 14 Nov 2008 17:05:30 -0600
Cc: public-html@w3.org
Message-Id: <FCDE4219-11B3-4FE9-A6B9-11C1FF0EF7A8@robburns.com>
To: Boris Zbarsky <bzbarsky@MIT.EDU>

Hi Boris,

I'm not really clear what your questions are directed at in my  
previous message. But here's an attempt to address the question you  
pose.

On Nov 14, 2008, at 2:40 PM, Boris Zbarsky wrote:

>
> Robert J Burns wrote:
>> 1) a markup parsing and serialization specification  with  
>> thoroughly specified error handling  that could apply as much to  
>> SGML (if DTD support was added back into it) as it applies to HTML
>
> So here's what I don't understand.  What do we mean by "parsing" in  
> this context?  Typically a parser either constructs some data  
> structure directly or provides a series of callbacks, right?
>
> So would this specification specify what the callbacks are for this  
> markup:
>
>  <div>
>    <table>
>      <tr>
>        <span>text</span>
>      <tr>
>    <table>
>  </div>
>
> ?  I would hope so.  If it does, how is that different from the  
> existing parsing specification?  Note that if I replace that <span>  
> by <td> I would expect different behavior from the parser, so the  
> parsing specification needs to be aware of specific elements of the  
> language and how they behave while parsing (heck, that's true for  
> HTML4, what with implied tags in some cases, etc).

Certainly, that would need to be part of a parsing the spec. The with  
SGML we had DTDs. With HTML5 we have prose along with specific error  
handling for ill-formed/invalid markup. What I'm suggesting is that  
this part of the HTML5 spec suffers from not having some specialized  
expertise applies to this. Ideally I think we could have a parsing  
specification that applied to HTML and SGML equally, but with the  
possibility of specifying error handling for other DTD specified SGML.  
Think of it as an SGML parser with a built-in HTML5 DTD. Its just the  
other DTDs wouldn't have all that fancy 'repair' of the tree by moving  
elements out of tables and the like (but would get ill-formedness  
error recovery).

>> 2) modified HTML language and DOM specification
>
> Of course the parsing specification depends on the former...  It's  
> possible to describe the parser in SAX-like terms without talking  
> about a DOM, I guess.  I'm not sure the added complexity of  
> description is necessarily warranted, though: the sequences of  
> callbacks parsing HTML with error handling produces is more complex  
> than SAX.

Parsing only depends on the HTML language with respect to the schema  
handling. Valid well-formed markup can be specified by a the language  
schema and leave error-handling specifications to the parsing  
algorithm. Perhaps it would better to say this is the specification of  
the HTML vocabulary  (elements, attributes, and content models) and  
DOM as opposed to the HTML 'language' and DOM.

>> 3) a web browser behavior specification (as Roy called it)  
>> including the thorough specification of DOM method and attribute  
>> processing algorithms
>
> Note that in practice parsing might need to depend on attribute  
> values....

Could you give an example where parsing depends on attribute values?

> None of this even starts to touch the impact that script, if it's  
> being executed, has on parsing, of course.  That would need to be  
> covered too, and I'm not sure which of your three parts you envision  
> handling that.


Still there's an independence. We can allow scripts to call the parser  
and we can have parsers produce scripts while still keeping the  
definition separate.

The point of my post (and what I read Roy Fielding saying) is that the  
current HTML5 specification's strength is in its web browser behavior  
specification. The parsing algorithm and the HTML vocabulary parts of  
the spec suffer because we don't have spec editors who sufficiently  
understand those parts.

Take care,
Rob
Received on Friday, 14 November 2008 23:06:08 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:59 UTC