Re: Request for Volunteers: Polyglot spec

On 04/21/2010 06:15 PM, Eliot Graff wrote:
> Today, I uploaded an EARLY draft version of a polyglot spec,
> "HTML/XHTML Compatibility Authoring Guidelines." [1]

A few QUICK comments:

> If a polyglot document uses an encoding other than UTF8 or UTF16

UTF-16 is not valid for HTML5.  I would recommend being more 
prescriptive: simply recomment (or even require) utf-8 as it is the only 
encoding guaranteed to be supported by all HTML and XML parsers.

> You must specify attribute values as lowercase.

This needs to be made more specific.  A few lines after this, you
provide a counter-example: <img src="karen.jpg" alt="Karen" />

> You should use only the following named entity references

This should either become a MUST, or this document needs to cover what 
DOCTYPES are acceptable.  I would recomment going with MUST.

> The named character reference &apos; (the apostrophe, U+0027) was
> introduced in XML 1.0 but does not appear in HTML.

&apos; is in HTML5.

> You should include a space before the trailing / and > of empty
> elements, e.g. <br />, <hr />

I haven't found this to be necessary.

> Also, you should use the minimized tag syntax for empty elements,
> e.g. <br />. The alternative syntax <br></br> allowed by XML gives
> uncertain results in many existing user agents.

I would recommend that this be a MUST.  The specific example you cite 
will produce different DOMs with HTML5 and XML1 parsers.

> Given an empty instance of an element whose content model is not
> EMPTY (for example, an empty title or paragraph) do not use the
> minimized form (e.g. use <p> </p> and not <p />).

Would suggest the use of RFC 2119 language (MUST not), and I suggest 
that the example be changed to <script src="..."> as this is an example 
that is particularly problematic.

- Sam Ruby

Received on Wednesday, 21 April 2010 23:15:17 UTC