- From: <bugzilla@wiggum.w3.org>
- Date: Thu, 26 Jun 2008 12:52:55 +0000
- To: public-html@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5808 Summary: Define a way to coerce HTML5 parser output to an XML 1.0 4th ed. + Namespaces 1.0 infoset Product: HTML WG Version: unspecified Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Spec proposals AssignedTo: dave.null@w3.org ReportedBy: hsivonen@iki.fi QAContact: public-html-bugzilla@w3.org CC: ian@hixie.ch, mike@w3.org, public-html@w3.org There's now a canned answer for anyone who argues that XHTML works better with the 'XML toolchain' than HTML5: "Just put an HTML5 parser at the start of your XML pipeline." There's a slight problem though: The HTML5 parser algorithm can output a document tree that is not an XML 1.0 4th ed. + Namespaces 1.0 infoset. This poses a problem if a processing pipeline serializes to XML and expects a later stage to reparse using a conforming XML 1.0 4th ed. + Namespaces 1.0 parser or if a component in the pipeline (e.g. the XOM library) performs early checks. Therefore, every HTML5 parser writer who wishes to provide a full-featured general-purpose HTML5 parser needs to come up with a coercion from an HTML5 DOM onto an XML 1.0 4th ed. + Namespaces 1.0 Infoset. I suggest documenting a mapping. Here's a list of problems with proposed solutions: * The document mode isn't part of the infoset: Optionally communicate as out-of-infoset-band data. Instruct apps to use the standards mode when not communicated. * The form pointer isn't part of the infoset: Make communicating the form pointer optional. Allow communicating it as out-of-infoset-band data. When the form element is not an ancestor of the form control, allow an UUID id attribute be generated on the form element and allow a form attribute be generated on the form control. * Some XML APIs treat the doctype as syntactic sugar: Make representing the document type information item is optional. * Attributes with the local name "xmlns" or a local name starting with "xmlns:" are not permitted attribute information items: Drop on the floor. * Namespace declarations are not attribute information items: Drop on the floor. (Optionally syntethize namespace information items for XLink and SVG or MathML on <svg> and <math> nodes, respectively, and XHTML namespace information items on HTML elements (including root) that do not have an HTML element as the parent.) * Form feed is not an XML character (either literally or as a character reference expansion): turn into a space. * The input stream contains a literal non-XML character other than form feed: turn into a REPLACEMENT CHARACTER. * A comment contains "--": Replace with "- -". * A name is not an NCName: Use the original name on tree builder stack for matching, but use as escaped name in the output. The escaping function must escape each non-NCName to a unique NCName, and the result must have at least one upper case ASCII character but must not match any known SVG camelCase name. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
Received on Thursday, 26 June 2008 12:53:30 UTC