Re: Polyglot Markup/XML encoding declaration

On Sun, 01 Aug 2010 08:55:28 +0100, Leif Halvard Silli  
<xn--mlform-iua@målform.no> wrote:

>> A polyglot may be served as HTML too.  HTML5 does consider the XML
>> declaration to be non-conformant, and including it is unnecessary
>> polution.
>
> This touches the question of whether Polyglot Markup is a specification
> or a authoring guide. The TAG by Tim Berners Lee has suggested that is
> to be a specification. Of course, even as a spec, it does not need to
> include the xml declaration. But if it is a spec, then it could include
> it.

The fact it could include it doesn't mean it's a good idea.

I think polyglot spec/guide should specifically try to minimize amount of  
markup that doesn't behave the same way in XML and HTML.

Since XML declaration is nearly useless in XHTML, and HTML doesn't make  
use of it, I see no reason to allow it.

>> Authors learn by copying and pasting. If they see lots of markup in
>> the wild using the XML declaration in HTML and that it appears to
>> declare the encoding, they will copy it into their own and not
>> understand that it doesn't do what they think.
>
> Some will undoubtedly become of the believe that meta@charset defines
> the encoding in XHTML. So you want to forbid it for that reason?

That could work. Authors serving XML must know how to set MIME type  
header, so we could expect them to set charset declaration at HTTP level  
too.

However, some authors might want to test polyglot documents opened from  
disk (file:// protocol), and in that case only XML would work well (using  
UTF-8 or UTF-16 — no declaration necessary). HTML parser would require  
<meta> to use UTF-8.

This is why I'm more inclined to allow <meta charset="UTF-8"/> (only) than  
to allow <?xml?>.

>> We've seen this scenario before when XHTML 1.0 started becoming
>> popular and lots of documents were unnecessarily copying the XML
>> declaration from each other, with many people falsely thinking that
>> it either meant XML parsing would be used by browsers that supported
>> it or that it declared the encoding.  The practice only died out
>> after people started realising it triggered quirks mode in IE6.  We
>> have no reason to start introducing it again.
>
> We have several reason to introduce it. But I agree that there are also
> reasons to no introduce it.

I'm not aware of reasons other than ability to declare legacy encodings in  
XML. What are the other reasons for it?

>>   "For compatibility with these types of legacy browsers, you may want
>>    to avoid using processing instructions and XML declarations"
>>
>> But Appendix C contains no normative requirements either, and so it
>> can't permit or deny anything.  It only provide recommendations.
>
> But the text/html MIME registration is very normative. And it states
> that XHTML 1.0 defines a profile which can be served as text/html.
> Thus, you are wrong. The XML declaration is officially permitted.

Situation with XHTML and text/html was messy. I think we should re-think  
those decisions, rather than use them as justification.

>>> The XML declaration would not be generally permitted in HTML - it would
>>> only be permitted in polyglot markup.
>>
>> There is no way to make some syntax conforming for polyglot documents  
>> only.
>
> Just make a validator which does.

Primary consumers of polyglot documents are going to be UAs (including  
ones already deployed), which won't have special parsing mode for polyglot  
documents.

Validators are just a tool, not a goal. Differences in what validators  
accept, and what UAs support only make validators less useful.

The goal should be to define syntax that works with both parsers, is as  
simple and robust as possible, and requires least amount of effort from  
authors. I don't see XML declaration helping that goal.

-- 
regards, Kornel Lesiński

Received on Sunday, 1 August 2010 14:55:14 UTC