Re: [XHTML2] CITELANG, TITLELANG attributes from Mikko Rantalainen on 2004-07-30 (www-html@w3.org from July 2004)

From: Mikko Rantalainen <mira@cc.jyu.fi>
Date: Fri, 30 Jul 2004 12:07:19 +0300
To: www-html@w3.org
Message-ID: <410A0FC7.1030309@cc.jyu.fi>

Ian Hickson / 2004-07-30 01:50:

> On Fri, 30 Jul 2004, Trejkaz Xaoza wrote:
> 
>>>But the XHTML spec doesn't require this -- it only requires 
>>>wellformedness checking.
>>
>>Oh great.  So regardless of what gets specified, we will still get 
>>people using random tag soup instead of valid XHTML, thanks to the 
>>browsers following a spec which says they're allowed to render invalid 
>>documents.
> 
> The primary problem with Tag Soup is not that documents are invalid, it's 
> that documents are ambiguous.
> 
> What does:
> 
>     <strong> A <em> B </strong> C </em>
> 
> ...translate to, as far as the DOM and CSS goes? No spec defines this.

Well, let's just define that and the problem is gone. How about we 
say that the opening tags override closing tags in case there are 
syntax errors (or the other way around) and parser should keep a 
stack of open elements so it can automatically close elements with 
incorrect markup.

For example, if the markup is "<a>1<b>2</a>3</b>" then the parser 
should generate tree a>b (up to "2" now), the it's expecting either 
data, opening tag or closing 'b'. It gets "</a>" instead. Now, we 
have two choices:

1) "</a>" gets ignored because it shouldn't be there. Parser closes 
'b' element when it sees matching "</b>". It doesn't match "</a>" in 
between but parser could automatically generate missing closing tags 
in correct order if the tree isn't complete when input ends[1]. In 
this case it still has 'a' element open when the input is done so it 
should close that. This method may cause incorrectly closed element 
to grow up to the end of the document and swallow all content in 
process, but it should make locating the error pretty easy.

2) The parser expects that author has missed one closing tag and it 
should automatically close open elements from the top of the stac 
until "</a>" can close 'currently' open element. So we have "a>b" 
and next tag is "</a>". We close 'b'. We have "a" and the next tag 
is "</a>". Okay, problem solved. This method is worse than 1) 
because the parser could generate closing tag for the root element 
and the rest of the input should then be thrown away.

Let's just specify how incorrect tree should be fixed (and keep it 
simple!). Somebody else can write more specific language for this.

[1] This step could be defined as undoable and we have always 
well-formed document for incremental renderning (make it behave like 
the input ends here and the parser must close all open elements in 
correct order).

-- 
Mikko

Received on Friday, 30 July 2004 05:07:06 UTC