- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 6 Jan 2010 07:21:03 +0000 (UTC)
- To: Joe D Williams <joedwil@earthlink.net>
- Cc: public-html@w3.org
On Tue, 5 Jan 2010, Joe D Williams wrote: > > > > XML has a different syntax than text/html HTML. > > Are there such differences that expression by xml schema is only > possible for HTML5 in XHTML form? As far as I can tell (which admittedly is not especially far), XML Schema is defined in terms of the XML Infoset, not in terms of the XML syntax. Therefore, anything that can be expressed in an XML Infoset can be syntax-checked by XML Schema. HTML5 defines how to coerce the output of an HTML parser (namely, a DOM) into an Infoset for toolchains that do not support features beyond those defined by the XML and XML Infoset specifications: http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#coercing-an-html-dom-into-an-infoset It's worth noting that the XML Infoset cannot distinguish every possible difference between XML documents, for instance one could not express the number of spaces between XML element attributes in the Infoset; these two documents therefore have the same Infoset despite having different XML serialisations: <test a="" b=""/> <test a="" b=""/> The same applies to text/html HTML5, as I described in my last e-mail. There are also semantically relevant aspects of text/html that cannot be expressed in an XML Infoset, such as whether the document is in quirks mode, or what the form element associations might exist that are not represented in the DOM. These are aspects that are mentioned by the coercion section cited above. Furthermore, there are semantically relevant aspects of text/html that cannot be expressed even by the DOM data structures, such as the functionality of <noscript> in the presence of script or in the absence of script. For validation purposes, these are handled in relatively complicated ways by the spec. As far as I can tell, there is no way to make straight XML Schema fully handle these features, as the information simply wouldn't be present in the Infoset. > Is there structure, content models, or combinations of HTML5 that cannot > be modelled by xml schema? Insofar as there are structures that cannot be modeled by the XML Infoset, yes. There may also be conformance requirements that cannot be fully expressed by XML Schema itself, but I'm not familiar enough with XML Schema to say whether this is the case or not. Henri might know. (It is the case that SGML DTDs, XML DTDs, RelaxNG, and Schematron all cannot fully express all the machine-checkable conformance requirements of HTML, so I would be surprised if it wasn't also the case for XML Schema.) HTH, -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 6 January 2010 07:21:32 UTC