W3C home > Mailing lists > Public > www-html@w3.org > March 2003

Re: SV: Latest version of XHTML

From: Herr Christian Wolfgang Hujer <Christian.Hujer@itcqis.com>
Date: Sat, 8 Mar 2003 20:18:39 +0100
To: Jim Dabell <jim-www-html@jimdabell.com>, www-html@w3.org, "Jesper Tverskov" <jesper.tverskov@mail.tele.dk>, "basil crow" <basilcrow@cox.net>
Message-Id: <200303082018.42740.Christian.Hujer@itcqis.com>

Hash: SHA1


On Saturday, 8th of March, Jesper Tverskov, basil crow and Jim Dabell:
> [discussion about current HTML versions and MIME types]

I want to confirm that it is corrupt to send XHTML 1.1 as MIME Type text/html.
The Internet and WWW are based on RFCs and Recommendations. If you don't 
follow all (currently valid) of them, why follow them at all?
The MIME Type text/html is for SGML based HTML. XHTML 1.0 is an exception for 
that rule. This exception is made only to enable a smoother transition from 
old SGML based HTML / tag soup HTML to well-formed XML based valid XHTML.

The correct MIME Type for sending XHTML 1.1 is application/xhtml+xml, not 
text/html, as already said by Jim.

But Internet Explorer doesn't accept application/xhtml+xml. Internet Explorer 
knows nothing about XHTML and even less about the MIME Type 
application/xhtml+xml. If you correctly serve XHTML 1.1 (only, without a HTML 
4.01 alternative), Internet Explorer Users won't be able to see your content 
but instead be presented a download dialog.

The solution for this problem follows:
I want to add that you could use the following scenario to deliver your XHTML 
1.1 content correctly.
Use an XSLT nearly-identity transformation which converts XHTML 1.1 to HTML 
4.01. Of course, this is not possible if you're using the Ruby Module, but I 
assume you aren't.

Use .xhtml as file extension for XHTML 1.1.
Use .html as file extension for HTML 4.01.

Now configure your webserver in a way that it delivers the .xhtml files as 
MIME Type application/xhtml+xml to those browsers that accept XHTML (these 
browsers announce this to the server by using application/xhtml+xml as part 
of the list value for the HTTP Accept: header field).
Other browsers will be delivered the HTML 4.01 version.
To safely do so, currently the HTML version needs a higher priority because 
the reload function of MS Internet Explorer is broken and sends Accept: */* 
instead of a qualified list.

For Apache, use the following lines in your .htaccess file to do so:
Options +MultiViews
AddType text/html;charset=US-ASCII .html
AddType application/xhtml+xml;charset=UTF-8;qs=0.999 .xhtml

This example assumes that text/html is delivered in US-ASCII and 
application/xhtml+xml is delivered in UTF-8, which is recommended because:
Elder HTML versions stated ISO-8859-1 being their default charset.
Newer HTML versions state ISO-10646 / Unicode being their default charset, 
similar to XML.
XHTML is XML so the same default charset rules apply for both, XML and XHTML.
In XML, UTF-8 is default.
Of course, using US-ASCII will do well in any case, but it will increase the 
file size if you use characters outside the US-ASCII range. UTF-8 keeps the 
filesize a bit smaller in that case. For instance, the German umlaut ü (u 
diarhesis) doesn't exist in US-ASCII, so an entity must be used: &uuml;, 
&#xFC; or &#252;. In UTF-8 it is the byte sequence with the binary values 
1100 0011 1011 1100 (if I calculated it correctly), so only takes two bytes 
while the entities in this example all need 6 bytes.

If you can follow the above charset example, you will be also aware of what 
problems can arise when serving no or the wrong default charset. My 
configuration example for Apache requires certain charsets to be used and 
then will cause no character problems in any current browser, even if the 
default settings of the browser are not the same as those of the documents.

I myself run a Linux system configured to use UTF-8 wherever possible. So 
UTF-8 also is the default charset in my web browsers. Pages with a charset 
declaration neither in the HTTP Content-Type Header nor in the XML 
declaration nor in a Meta element and using characters instead of entities 
usually show messed up characters instead of those outside the US-ASCII range 
until I select the charset manually (usually ISO-8859-15).

- -- 
Christian Wolfgang Hujer
Geschäftsführender Gesellschafter
Telefon: +49  (0)89  27 37 04 37
Telefax: +49  (0)89  27 37 04 39
E-Mail: Christian.Hujer@itcqis.com
WWW: http://www.itcqis.com/
Version: GnuPG v1.0.7 (GNU/Linux)

Received on Saturday, 8 March 2003 14:19:18 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:06:02 UTC