Re: Error recovery spec

James Clark scripsit:

> Given
> 
>  <a/><b/><c/>
> 
> do you correct to
> 
>   <a><b/><c/></a>

Yes.  I shouldn't have said "recursively"; only "a" has its end-tag
ignored.

> What do you do about
> 
> - text before any start-tag
> - completely empty document

Character data (which may be empty) can have a PreferredParent.
In TagSoup proper, it always does; in the HTML schema for TagSoup, the
PreferredParent is "body", whose PreferredParent is "html".  So an empty
document turns into <html><body></body></html>.

So to implement the use of #doc here, simply let the PreferredParent
of character data be "#doc" by default.

> I handle both these by wrapping them in <#doc>. But once one does that,
> it seems very natural to handle the multiple top-level element case
> in a similar way.

The advantage of not doing so is that TagSoup can be streaming.

-- 
John Cowan  cowan@ccil.org  http://ccil.org/~cowan
Linguistics is arguably the most hotly contested property in the academic
realm. It is soaked with the blood of poets, theologians, philosophers,
philologists, psychologists, biologists and neurologists, along with
whatever blood can be got out of grammarians. - Russ Rymer

Received on Tuesday, 18 December 2012 10:06:06 UTC