- From: James Clark <jjc@jclark.com>
- Date: Tue, 18 Dec 2012 17:13:35 +0700
- To: John Cowan <cowan@mercury.ccil.org>
- Cc: public-microxml@w3.org
Received on Tuesday, 18 December 2012 10:14:26 UTC
On Tue, Dec 18, 2012 at 5:05 PM, John Cowan <cowan@mercury.ccil.org> wrote: > > > What do you do about > > > > - text before any start-tag > > - completely empty document > > Character data (which may be empty) can have a PreferredParent. > You won't get a character data token from an empty document. > In TagSoup proper, it always does; in the HTML schema for TagSoup, the > PreferredParent is "body", whose PreferredParent is "html". So an empty > document turns into <html><body></body></html>. > > So to implement the use of #doc here, simply let the PreferredParent > of character data be "#doc" by default. > I agree that handles the first of my cases. But I don't think it handles the second. > > I handle both these by wrapping them in <#doc>. But once one does that, > > it seems very natural to handle the multiple top-level element case > > in a similar way. > > The advantage of not doing so is that TagSoup can be streaming. > I agree that this is a significant advantage. James
Received on Tuesday, 18 December 2012 10:14:26 UTC