- From: Sam Ruby <rubys@intertwingly.net>
- Date: Mon, 26 Jan 2009 05:27:21 -0500
- To: Ian Hickson <ian@hixie.ch>
- CC: HTML WG <public-html@w3.org>
Ian Hickson wrote: > On Sun, 25 Jan 2009, Sam Ruby wrote: >> Again, it is worth repeating that Venus produces a file. Whether that >> file is later served as text/html or as application/xhtml+xml is >> something the person who uses Venus decides. > > XML and text/html have differences that go far beyond mere syntax. When > you produce XML or text/html, you need to know which it is so that you can > output the right markup. The way nodes are exposed in the DOM, CSS rules > around the <body> and <tbody> elements, features like <noscript>, all > depend whether the document is XML or text/html. > > It's possible to output a polyglot document that is valid both as XHTML5 > in XML and HTML5 in text/html, but it requires care and discipline. (If > anything, this should be considered a third language and API set, stricter > than either of the other two.) One of the rules for making polyglot > documents is that one must output <!DOCTYPE HTML>, which is allowed in > both. (Other rules include being careful about using the /> form, being > careful about namespace declarations, being careful about xml:lang/lang, > being careful with script and CSS, etc.) EXACTLY!(*) THANK YOU! Wile there are BIG PROBLEMS in theory and in general, when you limit the scope to things that (a) pass through a sanitizer, and (b) are the subset of things that one would reasonably expect to appear within an <article>, the problems are considerably more manageable. I would like to stress that the use case is an application like Venus which produces files which are to be served later. By the definition of HTML 5 (note the space), these files are neither XHTML5 nor HTML5; such a distinction would depend on how these files are served over HTTP. And I'd like to repeat the point I made earlier: the one remaining thing that would make this use case less difficult to implement is permitting <meta charset> to appear in XHTML5, making it clear that user agents are to ignore such, and that it is non-conforming to specify a charset that differs from the one that an XML processor would associate with this document. I too often get bug reports that there are occasionally 'funny characters' in the output, which is the result of people not setting their content-type correctly. - Sam Ruby (*) OK, not exactly. I would argue for a lowercase 'html'. Given that this is likely to be a point of confusion, I prefer the way the WHATWG FAQ explains this over the way the current editor's draft does, namely the example itself utilizes a lowercase html. People *do* tend to copy/paste examples, often without reading the surrounding text adequately, or even at all.
Received on Monday, 26 January 2009 10:27:54 UTC