- From: <bugzilla@wiggum.w3.org>
- Date: Thu, 26 Jun 2008 13:12:51 +0000
- To: public-html@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5809 Summary: Mitigate data loss when conforming documents are coerced to XML 1.0 Product: HTML WG Version: unspecified Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Spec proposals AssignedTo: dave.null@w3.org ReportedBy: hsivonen@iki.fi QAContact: public-html-bugzilla@w3.org CC: ian@hixie.ch, mike@w3.org, public-html@w3.org Over in bug 5808 I suggested a way to coerce the output of the HTML5 parsing algorithm into XML. It's theoretically unpure for conforming documents to trigger coercions that aren't mostly harmless. I, therefore, suggest narrowing the conformance definition accordingly. * The document mode isn't part of the infoset: Optionally communicate as out-of-infoset-band data. Instruct apps to use the standards mode when not communicated. Mostly harmless. * The form pointer isn't part of the infoset: Make communicating the form pointer optional. Allow communicating it as out-of-infoset-band data. When the form element is not an ancestor of the form control, allow an UUID id attribute be generated on the form element and allow a form attribute be generated on the form control. Mostly harmless. * Some XML APIs treat the doctype as syntactic sugar: Make representing the document type information item is optional. Mostly harmless. * Attributes with the local name "xmlns" or a local name starting with "xmlns:" are not permitted attribute information items: Drop on the floor. Mostly harmless. However, in the case of <embed>, this theoretically loses conforming data. These attributes could be excluded from what is permitted on <embed> as plug-in parameters. * Namespace declarations are not attribute information items: Drop on the floor. (Optionally syntethize namespace information items for XLink and SVG or MathML on <svg> and <math> nodes, respectively, and XHTML namespace information items on HTML elements (including root) that do not have an HTML element as the parent.) Mostly harmless. * Form feed is not an XML character (either literally or as a character reference expansion): turn into a space. Mostly harmless. * The input stream contains a literal non-XML character other than form feed: turn into a REPLACEMENT CHARACTER. Mostly harmless, but these might as well be defined as non-conforming. * A comment contains "--": Replace with "- -". Mostly harmless. * A name is not an NCName: Use the original name on tree builder stack for matching, but use as escaped name in the output. The escaping function must escape each non-NCName to a unique NCName, and the result must have at least one upper case ASCII character but must not match any known SVG camelCase name. This is dataloss in theory even if not in probable practice. Attributes that are actually used on <embed> are NCNames anyway, so forbidding non-NCNames wouldn't break anything. Forbidding data-* from forming a non-NCName would still leave a countably infinite space of names, and authors are likely to use printable ASCII anyway. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
Received on Thursday, 26 June 2008 13:13:25 UTC