Re: Request for Volunteers: Polyglot spec from Jonas Sicking on 2010-04-01 (public-html@w3.org from April 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Thu, 1 Apr 2010 11:49:43 -0700
To: Philip Taylor <pjt47@cam.ac.uk>
Cc: Sam Ruby <rubys@intertwingly.net>, HTML WG <public-html@w3.org>, Technical Architecture Group <tag@w3.org>
Message-ID: <u2v63df84f1004011149t1cd75943j2c77d528122bf14f@mail.gmail.com>

On Thu, Apr 1, 2010 at 11:41 AM, Philip Taylor <pjt47@cam.ac.uk> wrote:
> Jonas Sicking wrote:
>>
>> On Fri, Mar 26, 2010 at 1:52 PM, Sam Ruby <rubys@intertwingly.net> wrote:
>>>
>>> I took an action item from the TAG yesterday to convey the following
>>> request:
>>>
>>>   The W3C TAG requests there should be in TR space a document
>>>   which specifies how one can create a set of bits which can
>>>   be served EITHER as text/html OR as application/xhtml+xml,
>>>   which will work identically in a browser in both bases.
>>>   (As Sam does on his web site.)
>>>
>>> This request requires a lot of explanation.  To start, it is recognized
>>> up
>>> front that this will be a subset of the set of possible documents that
>>> can
>>> be expressed as HTML5.  This is entirely OK.  For example, if it were to
>>> be
>>> the case that such a subset were to entirely disallow scripts of any
>>> kind,
>>> that would be acceptable as there exists a substantial class of documents
>>> which do not require scripting of any kind.
>>
>> Out of curiosity, what does "work identically" encompass? Do they have
>> to have the same DOM? Or just render the same when the default UA
>> stylesheet is applied? Or just be semantically equivalent?
>> [...]
>> If DOMs aren't important, only rendering is, I assume that this
>> document won't qualify:
>>
>> <html xmlns="http://www.w3.org/1999/xhtml">
>>  <head>
>>    <style> tbody { background: green } </style>
>>    <title>example document</title>
>>  </head>
>>  <body>
>>    Integer values for true/false.
>>    <table>
>>      <tr><td>true</td><td>1</td></tr>
>>      <tr><td>false</td><td>0</td></tr>
>>    </table>
>>  </body>
>> </html>
>
> This one would also render differently:
>
> <html xmlns="http://www.w3.org/1999/xhtml">
>  <head><title>example document</title></head>
>  <body>
>    <pre>
> Arbitrary example text</pre>
>  </body>
> </html>
>
> and this one will also cause data corruption depending on the content-type:
>
> <html xmlns="http://www.w3.org/1999/xhtml">
>  <head><title>example document</title></head>
>  <body>
>    <form>
>      Edit your comment:
>      <textarea name="comment">
> Your previous text</textarea>
>    </form>
>  </body>
> </html>
>
> (because the text/html parser strips a leading newline character in
> pre/textarea/listing elements), which seem like more serious issues than the
> <tbody>, since (unless I'm missing something) it's impossible to safely use
> these elements in polyglot documents, unless you do
>
>  <pre><!---->
>  text
>  </pre>
>
> which is a horrid hack and won't work for textarea anyway. So I think a true
> polyglot subset would have to exclude the textarea element, which limits its
> usefulness further. (Maybe the remaining subset is still large enough to be
> worth specifying in detail.)

Wouldn't also any textarea or pre that starts with anything but
whitespace be ok? Textareas are many times empty anyway.

/ Jonas

Received on Thursday, 1 April 2010 18:50:36 UTC