W3C home > Mailing lists > Public > public-html@w3.org > November 2009

Re: XML and HTML differences (Re: XML namespaces on the Web)

From: Maciej Stachowiak <mjs@apple.com>
Date: Tue, 17 Nov 2009 22:05:10 -0800
Cc: public-html@w3.org
Message-id: <F27E1AB4-77BD-4B45-B637-4041860BC60C@apple.com>
To: Michael Smith <mike@w3.org> (tm)

On Nov 17, 2009, at 8:15 PM, Michael(tm) Smith wrote:

> Maciej Stachowiak <mjs@apple.com>, 2009-11-17 15:40 -0800:
> [...]
>> (1) XML has draconian error handling, while text/html has tolerant  
>> (and with
>> HTM5 fully specified) error handling.
>> (2) XML supports arbitrary XML-style namespaces in the syntax, text/ 
>> html
>> supports only a short list of predefined namespaces.
>> (3) XML has a fairly strict conforming syntax, while even the  
>> conforming
>> text/html syntax allows many shortcuts (even setting aside the
>> error-tolerance).
>> (4) XML parsing is completely independent of the vocabulary, text/ 
>> html
>> parsing has many behaviors that are specific to the HTML vocabulary.
>> (5) XML has only a very small list of predefined entities with  
>> optional
>> addition of more via DTD processing, text/html has a fairly  
>> extensive list
>> of named entities.
>> Are there more important high-level differences that I'm forgetting?
> Not sure if the following is high-level on the same order as the
> above, but I think it's an important difference that many people
> are not aware of. That difference is: HTML has a few elements
> within whose contents particular characters and sequences are
> handled differently than they are in most other elements. What I
> mean are the <title>, <textarea>, <script>, and <style> elements.

I'd thought of this as included in point (4), vocabulary-specific  
parsing behavior, but you are right that it's noteworthy enough to  
call out by itself. Being able to skip the XML explicit CDATA section  
syntax for the contents of these special elements makes a significant  
difference to hand-authorability.

Received on Wednesday, 18 November 2009 06:05:45 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:45:03 UTC