W3C home > Mailing lists > Public > public-html@w3.org > June 2010

RE: Notes on the draft polyglot document Polyglot document

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Sun, 20 Jun 2010 08:30:04 +0200
To: Eliot Graff <eliotgra@microsoft.com>
Cc: Tim Berners-Lee <timbl@w3.org>, HTML WG <public-html@w3.org>, TAG List <www-tag@w3.org>
Message-ID: <20100620083004171554.4f664490@xn--mlform-iua.no>
Eliot Graff, Thu, 17 Jun 2010 18:39:56 +0000:

>> 2. Suggested title:  XML/HTML "Polyglot" Documents
> 
> Awaiting verification from the WG. 
> It looks like consensus points to "Polyglot Markup: HTML-compatible 
> XHTML Documents." Will make that change once I verify consensus.

As the draft now stands, the title says "Polyglot Markup". And then, 
the next time the word "polyglot" occurs (in the Abstract) – and 
through out the document – the text says "polyglot document" over over. 
A nice title which isn't reflected in the text, is suboptimal. 

At the bottom of the letter, I present a concrete amendment of TBL's 
abstract where I try show what i mean by incorporation. (Also, so you 
don't think that I try to sneak it in: I also make use of the term 
'HTML polyglot', because I think it is a useful term, even if it may be 
to complicated in the title.)

>> 3. The abstract should be an abstract of the document not information about
>> it.
>> Suggested abstract:
>> 
>> Abstract:
>> 
>> A polyglot document is an HTML5 document which is at the same time an
>> XML document and an HTML document, and meets a well defined set of
>> constraints. Polyglot documents meeting these constraints are interpreted
>> compatibly regardless of whether they are processed as HTML or as XHTML,
>> per the HTML5 specification. Polyglot documents use a specific doctype,
>> namespace declarations, and a specific case, normally lower case but
>> occasionally camel case, for element and attribute names. They use lower
>> case for certain attribute values. Further constraints include those 
>> on empty
>> elements, names entity references, and to the use of scripts and style.

> 3 & 4 done.

The abstract says that "Polyglot documents use a specific doctype". But 
what is meant by this? Is it meant that polyglot documents have a 
single DOCTYPE and that the docytp eis exactly <!DOCTYP html>? Not even 
HTML5 itself is that strict: HTML5 allows the 'about:legacy' doctype as 
well. And also, HTML5 mentions some obsolete doctypes that are only 
obsoleted because "they are unnecessarily long". Two of those doctypes 
are XHTML 1.0 Strict and XHTML 1.1. Any XHTML compatible DOCTYPE which 
works in HTML5 should in principle also work in a polyglot spec, not? 
(See the bug 9958 above for more on this.) 

My view: The polyglot spec needs to describe DOCTYPEs more generally, 
as HTML5 itself does. Filed as bug 9958.[1]

Finally: Proposed amendment of the Abstract, as example of how to 
incorporate the terminology - also filed as bug 9959: [2]

]]
This specification defines Polyglot Markup, an HTML-compatible XHTML 
document format. Documents of this kind are also known as HTML 
polyglots. An HTML polyglot is an HTML5 document which is at the same 
time an XML document and an HTML document. HTML polyglots that meet 
these constraints are, per the HTML5 specification, interpreted as 
compatible, regardless of whether they are processed as HTML or as 
XHTML. An HTML polyglot is obligated to have an XML- and 
HTML5-compatible doctype, to use namespace declarations, and to use a 
specific case—normally lower case but occasionally camel case—for 
element and attribute names. HTML polyglots use lower case for certain 
attribute values. Further constraints include those on empty elements, 
named entity references, and the use of scripts and style.
[[

[1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=9958
[2] http://www.w3.org/Bugs/Public/show_bug.cgi?id=9959
-- 
leif halvard silli
Received on Sunday, 20 June 2010 06:30:46 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:39:18 UTC