W3C home > Mailing lists > Public > public-html@w3.org > November 2012

Re: Polyglot Markup Formal Objection Rationale

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Tue, 6 Nov 2012 14:43:16 +0100
To: Smylers <Smylers@stripey.com>
Cc: public-html@w3.org
Message-ID: <20121106144316962669.dbe47564@xn--mlform-iua.no>
Smylers, Tue, 6 Nov 2012 12:50:03 +0000:
> Jirka Kosek writes:
> 
>> Polyglot spec in fact defines what would be called HTML5 profile in
>> ISO, or subset between mortal people.
> 
> The current Polyglot spec draft contradicts itself. As such, it isn't
> clear what it's supposed to be doing.

Those who have contributed, have had different ideas, such as:

* Simplicity as in "not having to care about HTML vs XHTML
* Simplicity as in "while HTML allows you to skip tags,
  XML does not allow that - and that is good".
* Nailing down how HTML and XML differ 

I may agree that it could be a good ideas to say more positively and 
explicitly what it seeks to do.

>>>> No, Polyglot has to explicitly define that only allowed encoding
>>>> is UTF-8 because we want polyglot to use only UTF-8
>>> 
>>> Who's the "we" that wants that?
>> 
>> Working Group, or even wider Web community -- I don't expect that
>> anyone sensible would promote UTF-16 as a recommended encoding.
> 
> Indeed I wouldn't promote UTF-16 as an encoding.
> 
> But there's a difference between what is promoted and what is permitted.
> 
> The HTML spec itself says "Authors are encouraged to use UTF-8" and
> "using non-UTF-8 encodings can have unexpected results", but allows
> UTF-16 -- to that extent "encouraged HTML" is a subset of conforming
> HTML.

I believe that some would argue that polyglot markup (but for the "/>" 
cruft) happens to be equal to "encouraged HTML". I believe this is 
where people like Henri gets nervous. :-) One is afraid that <br/> 
would be seen as any better than <br>.

> Your argument against UTF-16 surely applies equally to 'normal' HTML as
> to polyglot HTML? The discouragement of UTF-16 doesn't seem to be
> polyglot-specific. As such, I don't understand why it is a requirement
> of the polyglot spec.

I just explained: 
http://www.w3.org/mid/20121106133019548309.34dabd36@xn--mlform-iua.no

> There are all sorts of HTML 'best practices' one could encourage (indeed
> a 'best practices' profile could be defined). But these are orthogonal
> to whether one chooses to write normal or polyglot HTML. 
> 
> The Polyglot spec can be either of these:
> 
>   A simply the common parts of text/html and XHTML
> 
>   B the common parts of text/html and XHTML plus some additional
>     mandatory best practices
> 
> Has the working group decreed that it wants B? Apologies if I missed a
> decision on this somewhere. From the document's own introduction I'd
> been presuming it was A.
> 
> If the term "polyglot HTML" refers to B, that 'uses up' the term, and we
> perhaps need some other term to refer to A.

Polyglot says why: "Polyglot markup uses the UTF-8 character encoding, 
the only character encoding for which both HTML and XML require 
support."

>>> If there are to be additional restrictions then yes, indeed they have to
>>> be normative. I'd also suggest they need to be clearly distinguished
>>> from the requirements that are implied by being the intersection of XML
>>> and text/html, so that it is clear to anybody reading the spec that
>>> these are additional things they need to do.
>> 
>> If anyone wants to do this detective excercise to figure out whether
>> some restriction comes from HTML5 spec, XML spec, legacy browsers
>> behaviour, ... then he is certainly free to do it
> 
> Why would this be detective work? Surely the contributors to the
> Polyglot spec are aware of when they've decided to impose an additional
> restriction on documents?

If by "additional restriction" you mean "other than those you get when 
you combine XHTML and HTML", then I'd argue that that there are none. 
That said: It would be possible to define more than one kind of 
polyglot. E.g. David just suggested that Polyglot Markup should have 
had a more strict syntactic angle: 
http://www.w3.org/mid/509900C9.7070007@nag.co.uk

>> and it would be interesting to have this in the spec. But I doubt that
>> average reader of polyglot spec would be interested in this -- he/she
>> needs to know rules to follow and that's it.
> 
> When writing a polyglot HTML document, it's useful for somebody already
> familiar with HTML and XML to see what else is required.

You mean, so that, by referring to the principles, a half decent author 
could understand what it means, without reading the spec through? May 
be you are right - may it could be done better. OTOH, the spec is 
pretty short.

> When checking a polyglot HTML document for conformance, it's necessary
> to see what additional requirements need to be checked in addition to
> checking that it is valid text/html and valid XHTML. For instance, see
> this mail from Mike Smith earlier today, where his "sorta" covers that
> there may be additional requirements but at the moment it's impossible
> for the "average reader" of the spec to find them:
> http://lists.whatwg.org/pipermail/help-whatwg.org/2012-November/001101.html

I think his "sorta" covers validation and not reading of the polyglot 
markup spec.
-- 
leif halvard silli
Received on Tuesday, 6 November 2012 13:43:48 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:39:35 UTC