Re: Notes on the draft polyglot document Polyglot document

Tim Berners-Lee, Sun, 20 Jun 2010 09:47:13 -0400:
>> The abstract says that "Polyglot documents use a specific doctype". But 
>> what is meant by this? Is it meant that polyglot documents have a 
>> single DOCTYPE and that the docytp eis exactly <!DOCTYP html>? Not even 
>> HTML5 itself is that strict: HTML5 allows the 'about:legacy' doctype as 
>> well. And also, HTML5 mentions some obsolete doctypes that are only 
>> obsoleted because "they are unnecessarily long". Two of those doctypes 
>> are XHTML 1.0 Strict and XHTML 1.1. Any XHTML compatible DOCTYPE which 
>> works in HTML5 should in principle also work in a polyglot spec, not? 
>> (See the bug 9958 above for more on this.) 
> The goal of an abstract is summarize the document so that someone
> reading gets the gist of the document.  What it says should be true
> of course but the reader can be expected to delve into the document 
> for details.
> It should help the reader decide whether she will read the document itself.
> One should resolve the technical questions in the spec, and then make 
> sure the abstract is correct
> and a good summary, with an even balance of detail.
> So maybe ""Polyglot documents use a specific doctype" will become
> "Polyglot documents use specific doctypes"

To say "specific doctypes" sounds perfect. But I hope the phrase 
'polyglot documents' is not used in the Abstract, at least not without 
being explained first.
> (A good abstract should not be just introduction or just conclusion either!)

Good point. I feel that I only amended the the abstract you suggested 
rather little - but that doesn't mean that it weren't too much. :-)

>> My view: The polyglot spec needs to describe DOCTYPEs more generally, 
>> as HTML5 itself does. Filed as bug 9958.[1]
>> Finally: Proposed amendment of the Abstract, as example of how to 
>> incorporate the terminology - also filed as bug 9959: [2]
>> ]]
>> This specification defines Polyglot Markup, an HTML-compatible XHTML 
>> document format. Documents of this kind are also known as HTML 
>> polyglots. An HTML polyglot is an HTML5 document which is at the same 
>> time an XML document and an HTML document. HTML polyglots that meet 
>> these constraints are, per the HTML5 specification, interpreted as 
>> compatible, regardless of whether they are processed as HTML or as 
>> XHTML. An HTML polyglot is obligated to have an XML- and 
>> HTML5-compatible doctype, to use namespace declarations, and to use a 
>> specific caseónormally lower case but occasionally camel caseófor 
>> element and attribute names. HTML polyglots use lower case for certain 
>> attribute values. Further constraints include those on empty elements, 
>> named entity references, and the use of scripts and style.
>> [[
> As a matter of style, to reinforce the abstract as an abstract of the 
> document, not a description of the document, it should not start
> "This specification defines", but "Polyglot markup is.."

I picked "This specification defines" from the Abstract of HTML5.  But 
'Polyglot Markup is ..."  sounds like a good beginning. 

> (What is an XML-compatible doctype? All doctypes are XML-compatible,
> as DOCTYPE is a feature of XML, no?)

The polyglot spec many places talks about XML vs HTML instead of XHTML 
vs HTML. Both perspectives are useful. Instead of 'XML-compatible 
DOCTYPE' on could be even more specific and say "XHTML-compatible 

What I meant by 'XML-compatible DOCTYPE' is explained in bug 9958 [1]. 
In a XML-compatible doctype, the string "DOCTYPE", "SYSTEM", "PUBLIC" 
as well as any FPI, are matched case-sensitively. HTML5 allows  e.g. 
<!doctype html> to be valid, but that is not a XML/XHTML compatible 
DOCTYPE. (HTML4 defines 'DOCTYPE', 'SYSTEM', 'PUBLIC' in uppercase as 
well, but does not display an error if you use another 
case. also does not complain if you change the casing 
of the FPI in an HTML4 document, whereas it does complain if you change 
the casing of the FPI in a XHTML document.)

leif halvard silli

Received on Sunday, 20 June 2010 15:35:38 UTC