Re: Error recovery spec from James Clark on 2012-12-18 (public-microxml@w3.org from December 2012)

From: James Clark <jjc@jclark.com>
Date: Tue, 18 Dec 2012 17:13:35 +0700
To: John Cowan <cowan@mercury.ccil.org>
Cc: public-microxml@w3.org
Message-ID: <CANz3_EZ3Ypwreme5aqV=d-0SRKuHz-Er18-xOkQy_pEBdAQhqQ@mail.gmail.com>

On Tue, Dec 18, 2012 at 5:05 PM, John Cowan <cowan@mercury.ccil.org> wrote:

>
> > What do you do about
> >
> > - text before any start-tag
> > - completely empty document
>
> Character data (which may be empty) can have a PreferredParent.
>

You won't get a character data token from an empty document.


> In TagSoup proper, it always does; in the HTML schema for TagSoup, the
> PreferredParent is "body", whose PreferredParent is "html".  So an empty
> document turns into <html><body></body></html>.
>
> So to implement the use of #doc here, simply let the PreferredParent
> of character data be "#doc" by default.
>

I agree that handles the first of my cases. But I don't think it handles
the second.


>  > I handle both these by wrapping them in <#doc>. But once one does that,
> > it seems very natural to handle the multiple top-level element case
> > in a similar way.
>
> The advantage of not doing so is that TagSoup can be streaming.
>

I agree that this is a significant advantage.

James

Received on Tuesday, 18 December 2012 10:14:26 UTC