- From: John Cowan <cowan@mercury.ccil.org>
- Date: Tue, 16 Aug 2011 10:09:00 -0400
- To: Anne van Kesteren <annevk@opera.com>
- Cc: Noah Mendelsohn <nrm@arcanedomain.com>, "public-html-xml@w3.org" <public-html-xml@w3.org>, Larry Masinter <LMM@acm.org>
Anne van Kesteren scripsit: > Your principle is wrong. HTML is not repaired; processing just does > not stop. As the author of TagSoup, I'm well aware of that. However, it's tangential to my point. > Doing the same for XML is fairly trivial. There are many proposals in the literature for repairing XML (or, if you like, specifying the processing of non-well-formed XML), ranging from XML5 and a 2004 proposal by Siefkes <http://conferences.idealliance.org/extreme/html/2004/Siefkes01/EML2004Siefkes01.html>, which are independent of any schema, to TagSoup, which depends on a specially written schema in its own schema language, to Blažević's 2010 implementation, which employs a RELAX NG schema and hints in the form of PIs. The problem is that there is no compelling reason to prefer one approach to any other. Without such a justification, all we end up doing is complicating the description of XML further: instead of being able to say "report a fatal error", we must specify in detail exactly what infoset to produce for violations of each of the 83 productions, 12 well-formedness constraints, and 8 miscellaneous fatal-error specifications in XML 1.0 (Fifth Edition). -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org The native charset of SMS messages supports English, French, mainland Scandinavian languages, German, Italian, Spanish with no accents, and GREEK SHOUTING. Everything else has to be Unicode, which means you get only 70 16-bit characters in a text instead of 160 7-bit characters.
Received on Tuesday, 16 August 2011 14:09:27 UTC