Re: XML + HTML = XHTML ?

[This post is AFAIK. Others will surely correct the mistakes.]

Mihai P. B. Stiucan wrote:
> One more thing, as I wrote in subject line, how did XML came into the 
> world of HTML? Maybe I seems a little stupid, maybe that's the fact. I 
> started with HTML, I saw it is a markup language, with syntax and rules. 
>   Ok with that, then it comes CSS. I got that, it is an external file 
> which contains all formatters used in early HTML, that is to clean up 
> html file from attributes. But now here it comes XML ( and XSL). I 

In fact, the first versions of XML I can find are 
<URL:http://www.w3.org/TR/WD-xml-961114> and 
<URL:http://www.w3.org/TR/PR-xml-971208>. Considering that CSS was 
first(?) published here <URL:http://www.w3.org/TR/REC-CSS1-961217>, 
about the same time, I wouldn't say one or the other was in use before 
the other.

XML was invented to store and transmit any structured data in one common 
syntax. The idea was to make syntax so strict that the parser could be 
really simple. I think this was selected as a target when it had been 
seen how many parser bugs user agents (a.k.a. browsers) had.

XHTML 1.0 is simply the old HTML 4.01 spec in that stricter syntax. User 
agents are supposed to use same parsing rules for XHTML as for XML, but 
this isn't true. XML document should have content-type 
"application/xml+xhtml" but because one browser that's widely used 
pretty much nobody is using that. XML is used in many other places 
already, but I see some hard times for it replacing HTML because people 
are used to less strict rules of HTML. And pretty much every user agent 
must support older documents (more or less) so they now need the old 
HTML parser and XML parser (and CSS parser, and...).

Later standards are based on XML because that way the software has the 
required parser already done and can concentrate on getting the real 
logic right. In addition, there're many tools for authoring and refining 
any XML document. In my opinion, XML is overly verbose for some 
applications but I guess that's the tradeoff for using same parser for 
everything.

Some reading:
http://www.w3.org/TR/xmlbase/
http://www.w3.org/TR/REC-xml

> search for explanations, and all I got is "XML is for organizing data, 
> which will need to be rendered in browser."

Most of the time, XML is "wire format" for transferring data across 
different computers or applications. From all uses of XML, only very 
small part ever hits commonly used browsers.

XML isn't the greatest thing since sliced bread, though. For example, 
Microsoft could tell that their newest file format for Word files is 
XML. The XML could look pretty much like |<word 
encoding="base64">JGDSKJGKLSAJDGKJSADGLJDS....</word>| and it would 
still be XML. The real data could still be the same binary format it was 
before. If all applications used XML for the default save format and the 
structure of saved document were correctly expressed in XML, then making 
applications interoperable would be much easier.

> Now, on W3 page I see that XML is more powerfull and aimed to "replace" 
> the old HTML. From my point of view, I saw XML like an add-on, not like 
> a big brother of HTML. I think my opinion about XML is wrong.

As I said, XHTML 1.0 is pretty much HTML in stricter syntax. Pretty much 
the only more powerful feature is that you can use namespaces and embed 
other XML documents inside XHTML. At least in theory...

-- 
Mikko

Received on Thursday, 13 February 2003 05:54:42 UTC