W3C home > Mailing lists > Public > public-html@w3.org > April 2010

<!DOCTYPE html> vs (polyglot) spec

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Mon, 26 Apr 2010 18:09:56 +0200
To: HTMLwg <public-html@w3.org>
Cc: Eliot Graff <eliotgra@microsoft.com>
Message-ID: <20100426180956124976.7c9afa8a@xn--mlform-iua.no>
Chairs, editors, members,

I suggest the following HTML5 spec changes:
* make all HTML5 examples use UPPERCASE syntax for <!DOCTYPE HTML> [2]
* say that editors MAY - or SHOULD - use a XHTML compatible 
  DOCTYPE, in combination with a valid namespace string as 
  an XHTML syntax trigger

A) Should I file bugs for this?
B) Can we have polyglot product to file bugs against?

Background info:

The HTML5 doctype is defined as an "ASCII case-insensitive match" of 
the uppercased string <!DOCTYPE HTML>.  [1]  The HTML5 spec however 
appears silent on the fact that for XHTML5, then the 'DOCTYPE' string 
must be uppercased. While the 'html' part must be lowercased. This has 
implications for polyglot editing tools.

Now, consider how KompoZer behaves:

Leif Halvard Silli, Sat, 24 Apr 2010 21:09:58 +0200:
> For a 'file.html' with XHTML1 doctype, then KompoZer does 
> NOT "normalize" "/>" to ">". But if the MIME type is 
> <!DOCTYPE html>, then it *does* do that.

* given an XHTML 1.0 DOCTYPE, then KompoZer creates, preserves and 
auto-corrects the code as XHTML 1.0 Appendix C syntax. (It 
auto-transforms <img> as well as <img></img> into <img/>.) 
* Given a HTML 4.01 doctype, then KompoZer creates, preserves and 
auto-corrects the code as HTML 4.01 syntax. While 
* given an HTML5 doctype, it currently creates, preserves and 
auto-corrects polyglot XHTML into HTML 4.01 looking syntax (though it 
doesn't remove xmlns strings or xmlns prefixes - and it has the same 
behavior for the HTML 4.01 doctypes).

Hence, the question: For (X)HTML5, how can KompoZer know whether to 
operate with (polyglot) XHTML5 syntax or "pure" HTML5 syntax? [3] Note 
that KompoZer currently is only able to operate XHTML documents in 
'text/html' mode. [4]

Possible options for KompoZer in 'text/html' mode:

1. Use case sensitive match of <!DOCTYPE html> 
            as a polyglot XHTML syntax trigger
2. Use case sensitive match of <!DOCTYPE html>
            in combination with
            as a polyglot syntax trigger
3. ALWAYS create polyglot XHTML5 syntax. 
   That is: -never create HTML4 compatible HTML5 syntax;
            -always convert <!DOCTYPE HTML> to <!DOCTYPE html>;
            -always convert <img> to <img/> etc;
            -always auto-insert namespace string, if lacking;
4. ALWAYS create HTML 4.0.1 looking HTML5 syntax.
   That is: -always change <img/> to <img>;
            -always convert <!DOCTYPE html> to <!DOCTYPE HTML>;
            -always remove the namespace string;
5. Don't care about XHTML5 syntax at all i text/html mode.
   That is: -don't bother how the HTML5 doctype looks like;
            -never bother to remove or add the namespace string;
            -until application/xhtml+xml mode editing is ready, 
             tell authors who want to create polyglot XHTML syntax
             to use KompoZer to create XHTML 1.0 based documents.

Short evaluation: 

Option 1: To only look at the doctype seems like too little.
Option 2: DOCTYPE + namespace seems quite possible to me. 
          My preferred solution. [5] Though option 3 also
          could work fine.
Option 3: is not good if authors *want* to create HTML 4.01 looking 
HTML syntax. (But why would it be important to offer authors that 
option, whenever the polyglot syntax is valid in text/html?)
Option 4 and 5: Right now, Kompozer version 0.8 beta 3 transforms 
<!DOCTYPE HTMl>, but otherwise behaves more or less like Option 4. 
Thus, it is right now not possible to create XHTML5 syntax with 

[1] http://dev.w3.org/html5/spec/syntax#the-doctype





leif halvard silli
Received on Monday, 26 April 2010 16:10:33 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:16:01 UTC