- From: John Cowan <cowan@mercury.ccil.org>
- Date: Tue, 18 Dec 2012 05:05:43 -0500
- To: James Clark <jjc@jclark.com>
- Cc: public-microxml@w3.org
James Clark scripsit: > Given > > <a/><b/><c/> > > do you correct to > > <a><b/><c/></a> Yes. I shouldn't have said "recursively"; only "a" has its end-tag ignored. > What do you do about > > - text before any start-tag > - completely empty document Character data (which may be empty) can have a PreferredParent. In TagSoup proper, it always does; in the HTML schema for TagSoup, the PreferredParent is "body", whose PreferredParent is "html". So an empty document turns into <html><body></body></html>. So to implement the use of #doc here, simply let the PreferredParent of character data be "#doc" by default. > I handle both these by wrapping them in <#doc>. But once one does that, > it seems very natural to handle the multiple top-level element case > in a similar way. The advantage of not doing so is that TagSoup can be streaming. -- John Cowan cowan@ccil.org http://ccil.org/~cowan Linguistics is arguably the most hotly contested property in the academic realm. It is soaked with the blood of poets, theologians, philosophers, philologists, psychologists, biologists and neurologists, along with whatever blood can be got out of grammarians. - Russ Rymer
Received on Tuesday, 18 December 2012 10:06:06 UTC