W3C home > Mailing lists > Public > public-html@w3.org > August 2010

Re: Polyglot Markup/XML encoding declaration

From: Maciej Stachowiak <mjs@apple.com>
Date: Sun, 01 Aug 2010 21:30:48 -0700
Cc: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, Lachlan Hunt <lachlan.hunt@lachy.id.au>, HTMLwg <public-html@w3.org>, Eliot Graff <eliotgra@microsoft.com>, public-i18n-core@w3.org
Message-id: <5E3B4DC5-E4D6-499A-8D3D-E3326E61537F@apple.com>
To: ""Martin J. Dürst"" <duerst@it.aoyama.ac.jp>

On Aug 1, 2010, at 6:57 PM, Martin J. Dürst wrote:

> 
> 
> On 2010/08/02 9:05, Maciej Stachowiak wrote:
> 
>> On Aug 1, 2010, at 12:55 AM, Leif Halvard Silli wrote:
>> 
>>> Lachlan Hunt, Thu, 29 Jul 2010 15:30:02 +0200:
> 
>>> Just make a validator which does.
>> 
>> The original premise of the polyglot spec was to describe a type of document that is valid as both HTML5 and XHTML5, and works sufficiently the same both ways. Thus, it does not match the original goals to have a construct that is valid in polyglot documents, but invalid in at least one of HTML5 or XHTML5. Indeed, Lachlan already pointed this out:
>> 
>>> 
>>>> Such a requirement is unenforceable because the conforming
>>>> polyglot document syntax is and should remain only the intersection
>>>> of HTML and XHTML syntax.
> 
> So by definition, a validator for polyglot documents would validate with an XHTML5 validator and with an HTML5 validator, and the result (assuming true means "pass") would be the intersection of the two results.

The group's current thinking, if I understand correctly, is to actually make polyglot a subset of this intersection. Specifically, the subset would be limited to documents that not only are valid both ways, but also mean approximately the same thing, in the sense of producing the same DOM. For example, the following document is valid as both HTML5 and XHTML5:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><head>
<title>Test</title>
</head>
<body>
<table>
<tr><td>Cell</td></tr>
</table>
</body></html>

However, it produces a different DOM; the HTML parser will automatically insert a <tbody> element inside the <table>, but the XML parser will not. This difference is observable to CSS and JavaScript. Adding an explicit <tbody> removes the difference. So this document is an example of the kind of intersection-valid document that might still not be valid per the polyglot spec.

> 
> I'm not sure how difficult it would be to construct such a validator, it depends on the availability of validators for HTML5 and XHTML5, and on their interfaces, but in principle, it shouldn't be too difficult.

In fact, validator.nu already has a "Polyglot" preset which checks both ways.

Regards,
Maciej
Received on Monday, 2 August 2010 04:31:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 2 August 2010 04:31:27 GMT