- From: David Sheets <kosmo.zb@gmail.com>
- Date: Wed, 23 Jan 2013 15:13:23 -0800
- To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Cc: Henri Sivonen <hsivonen@iki.fi>, Daniel Glazman <daniel@glazman.org>, Sam Ruby <rubys@intertwingly.net>, Noah Mendelsohn <nrm@arcanedomain.com>, "www-tag@w3.org List" <www-tag@w3.org>, "public-html@w3.org" <public-html@w3.org>
On Wed, Jan 23, 2013 at 1:11 AM, Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> wrote: > David Sheets, Tue, 22 Jan 2013 21:18:00 -0800: > >> What is the reason that >> <http://dev.w3.org/html5/html-xhtml-author-guide/#content-type> says >> >> <blockquote> >> The HTTP Content-Type: header has no extra rules or restrictions, >> whereas polyglot markup does not use the http-equiv="Content-Type" >> declaration on the meta element. >> </blockquote> > > The Polyglot Markup spec limits itself to define a subset of the HTML5 > spec, which permits meta@charset=UTF-8 in both XHTML code and HTML > code, whereas the HTML5 spec only permits meta@http-equiv in HTML code. Are you referring to <http://www.w3.org/html/wg/drafts/html/master/document-metadata.html#attr-meta-http-equiv-content-type>? See below for the operational details that makes these prescriptive statements pointless. >> This suggests to me that putting something like >> >> <meta http-equiv="Content-Type" content="application/xhtml+xml" /> > > A case could be made for allowing 'text/html;charset=UTF-8' in XHTML5 > since meta@charset has somewhat limited support outside the GUI browser > world. For instance, Microsoft Word and Open Office doesn't support > <meta charset="UTF-8"/>. Which, I have to admit, feels like a pain in > polyglot’s robustness principle ass. ;-) But then again: If you > export/download a Google Docs document (from Google Drive) as HTML, you > will find that it contains no encoding declaration (and no DOCTYPE for > that matter) - all the non-ASCII is converted to numerical character > entities. > >> is a potential way to indicate to text/html consumers that this >> representation is also parseable by an XML parser and interpretable by >> an XHTML renderer. >> >> Is this ill-advised for some reason? Is there a pitfall here of which >> I am ignorant? >> >> It would be nice to embed useful metadata indicating that the present >> representation is intended to have identical semantics under different >> media types' interpretations. This would give multi-modal consumers a >> means to leverage both HTML and XML processing on the document if so >> instructed. > > If you meant that one could include two meta based encoding decalraiton > elements in the same document, then HTML5 forbids that as well. > http://www.w3.org/html/wg/drafts/html/master/document-metadata.html#charset This would not be an encoding or charset declaration. This would be a piece of embedded metadata stating that the author's intent is that the containing representation can be interpreted identically under text/html and application/xhtml+xml. The HTML5 spec says <http://www.w3.org/html/wg/drafts/html/master/document-metadata.html#attr-meta-http-equiv-content-type>: <blockquote> The Encoding declaration state is just an alternative form of setting the charset attribute: it is a character encoding declaration. This state's user agent requirements are all handled by the parsing section of the specification. </blockquote> which, I believe, refers to <http://www.w3.org/html/wg/drafts/html/master/infrastructure.html#algorithm-for-extracting-a-character-encoding-from-a-meta-element>: <blockquote> Loop: Find the first seven characters in s after position that are an ASCII case-insensitive match for the word "charset". If no such match is found, return nothing and abort these steps. </blockquote> which indicates to me that <meta http-equiv="Content-Type" content="application/xhtml+xml" /> would be put in the DOM under both HTML5 and XHTML5, would not interfere with charset detection, and would be benign. Non-HTTP HTML consumers can interpret the representation as text/html and non-HTTP XHTML consumers can interpret the representation as XHTML. When this representation is served, the server may extract this embedded metadata to decide how to serve the document. Do you know of any specific subsystems that fail if this is done? Do the HTML and XML DOMs diverge? Despite what the "normative" prose in HTML5 says, the algorithms contained in the spec don't appear to care about meta/@http-equiv which does not specify a 'charset' media type parameter. This tag seems to be the most appropriate for expressing the polyglot-ness of an (X)HTML document. Maybe there is another way to declare this authorial intent, however. <!DOCTYPE html> implies text/html conformance <meta http-equiv="Content-Type" content="application/xhtml+xml" /> implies application/xhtml+xml conformance Thoughts? David
Received on Wednesday, 23 January 2013 23:53:44 UTC