Re: Notes on the draft polyglot document Polyglot document from Leif Halvard Silli on 2010-06-11 (public-html@w3.org from June 2010)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Fri, 11 Jun 2010 19:46:07 +0200
To: "Simpson, Grant Leyton" <glsimpso@indiana.edu>
Cc: Julian Reschke <julian.reschke@gmx.de>, Daniel Glazman <daniel.glazman@disruptive-innovations.com>, HTML WG <public-html@w3.org>
Message-ID: <20100611194607008691.41527edd@xn--mlform-iua.no>

Simpson, Grant Leyton, Fri, 11 Jun 2010 09:47:28 -0400:
> This proposed title seems much better to me. "Polyglot markup" is 
> much clearer. As someone who had not participated in the discussions 
> surrounding the document, the original title was quite confusing to 
> me.

What about the last part of the title? Se below.

> On Jun 11, 2010, at 9:43 AM, Julian Reschke wrote:
> 
>> Sounds good to me.
>> 
>> How about:
>> 
>>   "Polyglot markup: XML Compatible HTML Documents"
>> 
>> ...because I think that reflects better why people are interested in 
>> this (publishing HTML that can be processed as XML).

Regarding

	'XML Compatible HTML Documents' (your proposal)
vs	'HTML Compatible XML Documents' (original proposal)

The choice of wording in the original proposal here, is deliberate. 
Consider the following, valid HTML5 document, which could also be 
parsed as a well-formed XML document and thus arguably qualifies as an 
"XML Compatible HTML Document":

	<!DOCTYPE html>
	<html xmlns="http://www.w3.org/1999/xhtml" >
		<title></title>
	</html>

However, the goal with the polyglot spec at hand, is more than validity 
within each syntax paradigm. Our polyglot spec seeks to define a 
mark-up that parses to a DOM that is as equivalent as possible 
regardless of XML vs HTML parsing. For example, the reason why polyglot 
mark-up *requires* that the body and head elements to be present in the 
*code*, is because leaving them out causes them to be auto-generated on 
the HTML side, whereas nothing happen on the the XML side. Since we are 
the HTML working group, our task is XHTML and HTML. However, it is 
important to remember that the *degree* of DOM equivalence, depends on 
the goal with polyglot spec.

The latter (my) variant incorporates the perspective that (a) the XML 
rules are impossible to change, and that (b) it is *HTML* that sets 
some extra restrictions on what kind of XML it is permitted to produce. 
From text/html's point of view, the requirements of the XML syntax, are 
only "croutons" with zero or ignorable DOM effects: <meta/>, <img/>, 
xml:lang="*", xmlns* etc) - the degree to which they are ignorable 
within HTML, is defined by the rules for 'text/html'.  

We are the HTML Working Group. Our task is to decide what is compatible 
with 'text/html'. That is also what Appendix C sought to define. I 
therefore continue to believe that "HTML Compatible XML Documents" is 
more accurate and that it conveys a more useful message than the 
alternative.

By the way, what do you think about saying "XHTML" instead of "XML"?

	'Polyglot markup: HTML Compatible XHTML Documents'

-- 
leif halvard silli

Received on Friday, 11 June 2010 17:46:46 UTC