Re: Error recovery spec

On Tue, Dec 18, 2012 at 5:05 PM, John Cowan <cowan@mercury.ccil.org> wrote:

>
> > What do you do about
> >
> > - text before any start-tag
> > - completely empty document
>
> Character data (which may be empty) can have a PreferredParent.
>

You won't get a character data token from an empty document.


> In TagSoup proper, it always does; in the HTML schema for TagSoup, the
> PreferredParent is "body", whose PreferredParent is "html".  So an empty
> document turns into <html><body></body></html>.
>
> So to implement the use of #doc here, simply let the PreferredParent
> of character data be "#doc" by default.
>

I agree that handles the first of my cases. But I don't think it handles
the second.


>  > I handle both these by wrapping them in <#doc>. But once one does that,
> > it seems very natural to handle the multiple top-level element case
> > in a similar way.
>
> The advantage of not doing so is that TagSoup can be streaming.
>

I agree that this is a significant advantage.

James

Received on Tuesday, 18 December 2012 10:14:26 UTC