- From: Mikko Rantalainen <mira@cc.jyu.fi>
- Date: Fri, 30 Jul 2004 12:07:19 +0300
- To: www-html@w3.org
Ian Hickson / 2004-07-30 01:50: > On Fri, 30 Jul 2004, Trejkaz Xaoza wrote: > >>>But the XHTML spec doesn't require this -- it only requires >>>wellformedness checking. >> >>Oh great. So regardless of what gets specified, we will still get >>people using random tag soup instead of valid XHTML, thanks to the >>browsers following a spec which says they're allowed to render invalid >>documents. > > The primary problem with Tag Soup is not that documents are invalid, it's > that documents are ambiguous. > > What does: > > <strong> A <em> B </strong> C </em> > > ...translate to, as far as the DOM and CSS goes? No spec defines this. Well, let's just define that and the problem is gone. How about we say that the opening tags override closing tags in case there are syntax errors (or the other way around) and parser should keep a stack of open elements so it can automatically close elements with incorrect markup. For example, if the markup is "<a>1<b>2</a>3</b>" then the parser should generate tree a>b (up to "2" now), the it's expecting either data, opening tag or closing 'b'. It gets "</a>" instead. Now, we have two choices: 1) "</a>" gets ignored because it shouldn't be there. Parser closes 'b' element when it sees matching "</b>". It doesn't match "</a>" in between but parser could automatically generate missing closing tags in correct order if the tree isn't complete when input ends[1]. In this case it still has 'a' element open when the input is done so it should close that. This method may cause incorrectly closed element to grow up to the end of the document and swallow all content in process, but it should make locating the error pretty easy. 2) The parser expects that author has missed one closing tag and it should automatically close open elements from the top of the stac until "</a>" can close 'currently' open element. So we have "a>b" and next tag is "</a>". We close 'b'. We have "a" and the next tag is "</a>". Okay, problem solved. This method is worse than 1) because the parser could generate closing tag for the root element and the rest of the input should then be thrown away. Let's just specify how incorrect tree should be fixed (and keep it simple!). Somebody else can write more specific language for this. [1] This step could be defined as undoable and we have always well-formed document for incremental renderning (make it behave like the input ends here and the parser must close all open elements in correct order). -- Mikko
Received on Friday, 30 July 2004 05:07:06 UTC