- From: Simon Pieters <simonp@opera.com>
- Date: Thu, 19 Nov 2009 09:50:47 +0100
- To: "John Cowan" <cowan@ccil.org>
- Cc: "Lachlan Hunt" <lachlan.hunt@lachy.id.au>, "Liam Quin" <liam@w3.org>, public-html@w3.org, public-xml-core-wg@w3.org
On Thu, 19 Nov 2009 09:21:14 +0100, John Cowan <cowan@ccil.org> wrote: > Simon Pieters scripsit: > >> Why would one need to reverse engineer an XML parser? It is defined in >> XML >> 1.0 what is an error, so one can just read the XML 1.0 spec and modify >> the >> XML5 algorithm accordingly. > > Sure, it's possible, but it's about equivalent in complexity to writing > a parser, which has already been done repeatedly. Yes. > Wake me up when > it's finished. Ok. >> It's not clear to me that that is a goal. It would be possible by making >> up a bogus root element, but that seems just bogus. :-) > > Fair enough, but then there needs to be some kind of restriction on what > documents can and cannot be repaired. > >> I see "DOCTYPE internal subset state" and in total 38 tokenizer states >> dedicated to handling the internal subset in >> http://xml5.googlecode.com/svn/trunk/specification/Overview.html > > Yes, it skips the internal subset all right, but there's no indication > that it uses the information to, for example, correctly implement > attribute value normalization. Whitespace characters are added to > attribute values just like any other characters. It seems handling entities is covered but handling attribute declarations is not done yet, but is intended to be covered since it defines a "list of attribute declarations" and the relevant tokenizer states have issue markers. -- Simon Pieters Opera Software
Received on Thursday, 19 November 2009 08:51:49 UTC