Why HTML should be taught as HTML without pretending it is XML

First, why does HTML 5 allow certain XMLisms? The reason is to ease  
the migration from text/html bogo-XHTML to HTML5. *Not* the other way  

As a side effect of easing that migration, it is theoretically  
possible to write documents that mean the same thing both as text/ 
html and as applications/xhtml+xml. However, almost anyone who tries  
to do his/her own stunts on this point is likely to shoot him/herself  
in the foot. Doing this is such a bag of gotchas that it definitely  
should not be recommended for novices. It shouldn't be even  
recommended to experts.

Sure, it would be nice to teach novices that HTML is like XML if it  
were true. But it isn't. If someone is taught the convenient untruth  
that you can write HTML just like XML, it is likely that (s)he will  
write stuff such as:


<script src='...'/>

<p>foo <ol><li>bar</li></ol> baz</p> (permitted in XHTML5, btw, as  
per the current draft)

or perhaps even


None of these mean in text/html what they would mean in XML. To write  
proper text/html, a person needs know which elements don't have an  
end tag and which elements can go inside a p element. This should be  
taught to novices.

text/html just is not XML for historical reasons. It is inconvenient,  
but hiding the truth from the student is doing a disservice to him/ 
her, because the differences are so easy to hit accidentally if you  
don't know what they are and are thinking in terms of XML. Teaching  
that HTML is different from XML properly given the student notice  
that there indeed are differences.

(However, a person who is learning to *write* text/html doesn't need  
to be taught up front which tags can be omitted, since it is quite ok  
not to omit tags.)

The ease of migration to application/xhtml+xml is a flawed argument,  
because there are little things you can use very easily that  
immediately break the source-level migration path. The use of an  
entity that isn't predefined in XML or the use of document.write()  
are examples of such things. Experience with the infamous Appendix C  
suggests that people *will* use these features when they don't see  
immediate catastrophic failure.

Henri Sivonen

Received on Tuesday, 17 July 2007 11:00:50 UTC