Re: Prevalence of ill-formed XHTML from Robert Burns on 2007-09-01 (public-html@w3.org from September 2007)

From: Robert Burns <rob@robburns.com>
Date: Sat, 1 Sep 2007 14:47:46 -0500
To: Philip Taylor <philip@zaynar.demon.co.uk>
Cc: "public-html@w3.org WG" <public-html@w3.org>
Message-Id: <225E0A67-F582-4DDB-BFCC-EA5EFADABC18@robburns.com>
HI Philip,

On Sep 1, 2007, at 8:32 AM, Philip Taylor wrote:

>
> Robert Burns wrote:
>> What problems would an author face with actual browsers if they  
>> authored valid and well-formed XHTML 1.0 that also adhered to the  
>> appendix C guidelines and then delivered that content as text/ 
>> html? I cannot think of any and I've yet to hear any issues  
>> presented (Note that adhering to appendix C means there's no CData  
>> sections and <script> is always closed with </script>).
>
> http://www.w3.org/TR/xhtml1/guidelines.html doesn't say anything  
> about not using CDATA sections.

Yes it is true that appendix C does not say do not use CDATA  
sections, however, in best-practice circles that is how it is   
commonly interpreted. That is authors keep both scripts and  
stylesheets external and therefore have no reason to use CDATA sections.

> [...]
>
> XHTML code like:
>   <textarea>
>   Text</textarea>
> in Firefox results in "Text" on the second line of the text area.  
> (Opera and Safari disagree. I think XHTML5 agrees with Firefox).  
> When you send that as text/html, the leading newline will be  
> ignored, so you will get data loss when submitting the form.

If I understand you correctly, this is an issue with XHTML  
interoperability and not an issue with serving XHTML as text/html.  
The data loss relates to whether different browsers will return  
different results when served as application/xhtml+xml (only related  
to whether the newline is included or not). That is a minor  
interoperability problem with application/xhtml+xml. Or are you  
referring to something else.

> The checked, disabled, readonly, etc attributes can't be used at  
> all in a document that follows Appendix C's advice to work in old UAs.

I'm not aware of any browsers that do not support unminimized boolean  
attributes. How widespread is that issue?

> (I expect there are plenty of other issues - it seems it would be  
> hard to write something like Appendix C that is actually correct.)

If authors stick to external scripts and stylesheets and follow the  
other guidelines in appendix C, then the only problem we've found so  
far is the inability to target browsers that do not support  
unminimized boolean attributes.  Depending on how widespread that  
issue is, it may be an issue many authors are willing to live with.  
These few minor differences hardly warrant the enormous amount of  
attention this issue has gotten and keeps getting.

I'm not so sure this relates too much to HTML5. I think our allowance  
of xml:lang,  self-closing tags for void elements, and the  
deprecation of 'name' attributes even in the text/html serialization  
sufficiently deal with the same issues as appendix C.

I think there are other things we might do to deal with other issues  
like easing the migration between XML and text serializations. For  
example, we might:

  * require tbody and colgroup elements in the table content model  
(with implied start and end tags in the text serialization).
  * define DOM APIs to work in more consistent ways across  
serializations. We might even make the DOM serialization agnostic (or  
in some sense more serialization aware) so that methods could include  
innerXML as well as innerHTML to get (or set) a serialized string  
from the DOM for either serialization.
  * deprecate NOSCRIPT elements for text serialization.
  * deprecate the use of characters outside the XML allowed  
characters in the text serialization.
  * add document.write() to the DOM even for XML de-serialized documents
  * And so on to minimize differences between the DOM and in-memory  
HTML created by these serializations.

There will be some unavoidable differences (such as some more  
expressive content models for XML), but we should minimize  
differences where ever we can. I'v started a new WIKi page to track  
these issues since, they keep coming up and we need a single place to  
store them. Again this is for differences in processing the two  
different serializations that come after parsing.

<http://esw.w3.org/topic/HTML/ 
SerializationDependentProcessingDifferences>

Take care,
Rob
Received on Saturday, 1 September 2007 19:48:01 UTC