- From: Robert J Burns <rob@robburns.com>
- Date: Fri, 14 Nov 2008 17:05:30 -0600
- To: Boris Zbarsky <bzbarsky@MIT.EDU>
- Cc: public-html@w3.org
Hi Boris, I'm not really clear what your questions are directed at in my previous message. But here's an attempt to address the question you pose. On Nov 14, 2008, at 2:40 PM, Boris Zbarsky wrote: > > Robert J Burns wrote: >> 1) a markup parsing and serialization specification — with >> thoroughly specified error handling — that could apply as much to >> SGML (if DTD support was added back into it) as it applies to HTML > > So here's what I don't understand. What do we mean by "parsing" in > this context? Typically a parser either constructs some data > structure directly or provides a series of callbacks, right? > > So would this specification specify what the callbacks are for this > markup: > > <div> > <table> > <tr> > <span>text</span> > <tr> > <table> > </div> > > ? I would hope so. If it does, how is that different from the > existing parsing specification? Note that if I replace that <span> > by <td> I would expect different behavior from the parser, so the > parsing specification needs to be aware of specific elements of the > language and how they behave while parsing (heck, that's true for > HTML4, what with implied tags in some cases, etc). Certainly, that would need to be part of a parsing the spec. The with SGML we had DTDs. With HTML5 we have prose along with specific error handling for ill-formed/invalid markup. What I'm suggesting is that this part of the HTML5 spec suffers from not having some specialized expertise applies to this. Ideally I think we could have a parsing specification that applied to HTML and SGML equally, but with the possibility of specifying error handling for other DTD specified SGML. Think of it as an SGML parser with a built-in HTML5 DTD. Its just the other DTDs wouldn't have all that fancy 'repair' of the tree by moving elements out of tables and the like (but would get ill-formedness error recovery). >> 2) modified HTML language and DOM specification > > Of course the parsing specification depends on the former... It's > possible to describe the parser in SAX-like terms without talking > about a DOM, I guess. I'm not sure the added complexity of > description is necessarily warranted, though: the sequences of > callbacks parsing HTML with error handling produces is more complex > than SAX. Parsing only depends on the HTML language with respect to the schema handling. Valid well-formed markup can be specified by a the language schema and leave error-handling specifications to the parsing algorithm. Perhaps it would better to say this is the specification of the HTML vocabulary (elements, attributes, and content models) and DOM as opposed to the HTML 'language' and DOM. >> 3) a web browser behavior specification (as Roy called it) >> including the thorough specification of DOM method and attribute >> processing algorithms > > Note that in practice parsing might need to depend on attribute > values.... Could you give an example where parsing depends on attribute values? > None of this even starts to touch the impact that script, if it's > being executed, has on parsing, of course. That would need to be > covered too, and I'm not sure which of your three parts you envision > handling that. Still there's an independence. We can allow scripts to call the parser and we can have parsers produce scripts while still keeping the definition separate. The point of my post (and what I read Roy Fielding saying) is that the current HTML5 specification's strength is in its web browser behavior specification. The parsing algorithm and the HTML vocabulary parts of the spec suffer because we don't have spec editors who sufficiently understand those parts. Take care, Rob
Received on Friday, 14 November 2008 23:06:08 UTC