Re: 1 element becomes 2 from Michael A. Peters on 2009-05-20 (www-validator@w3.org from May 2009)

From: Michael A. Peters <mpeters@mac.com>
Date: Wed, 20 May 2009 16:49:58 -0700
To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
Cc: Melody Chamlee <developer@pobox.com>, Hannes Kirsman <hkirsman@gmail.com>, www-validator@w3.org
Message-id: <4A149726.9040205@mac.com>

Jukka K. Korpela wrote:
> Melody Chamlee wrote:
> 
>> I encourage you to continue validating your XHTML docs to make your
>> content easier to update and manage in the long run.
> 
> Do you have any factual grounds for this (which?), or are you just 
> repeating common XHTML propaganda? As we have seen even in this simple 
> case, XHTML syntax makes things harder, not simpler.

I disagree that it makes it harder, just different.

When I write a filter to modify or extract data from a document, it's a 
lot easier if the document is valid xml.

One of the issues is multi-byte utf8 characters. It seems trying to 
import an html document into a DOM object via anything built against 
libxml2 screws up some unicode characters. However if the document is 
xml the document imports just fine.

Thus to deal with html documents - before importing I have to first 
alter the document to produce xml w/o modifying the content.

Since xhtml is xml, I don't have to worry about that, I can just read it 
into a DOM object and manipulate it with tools directly.

Maybe libxml2 could be patched to work better with utf8 html input, but 
the fact still remains that xhtml is xml so any tools designed to work 
with xml input will work with xhtml.

Another distinct advantage of xhtml is you can add custom attributes.
IE - the search engine sphider currently uses html comments to turn off 
indexing of of part of a page. While it works, it probably isn't the 
best use of comments, that's not what comments are for.

With xhtml, you can easily extend xhtml to either add an element that 
means nothing display-wise but means something to spider, or (probably 
better approach) add an attribute that tells sphider not to index that 
node or it's children. With the proper declaration, it will still 
validate with the custom attribute. Try adding a custom attribute to 
html. It won't validate unless you define a custom (non W3C) DTD.

An even better example is complex equations. Only way I know how to do 
it in html is with an image (often requiring a TeX installation on the 
server and often requiring the server code to have shell execution 
permission to run the TeX compiler). With xhtml, you can do it with 
MathML, and browsers that support the MathML extension will properly 
display the equations.

Received on Wednesday, 20 May 2009 23:50:58 UTC