W3C home > Mailing lists > Public > public-html@w3.org > August 2007

Re: review of content type rules by IETF/HTTP community

From: Leif Halvard Silli <lhs@malform.no>
Date: Tue, 21 Aug 2007 14:03:43 +0200
Message-ID: <d1d35b9bc6dea81ea106aa802d975baa@10013.local>
To: Karl Dubost <karl@w3.org>
Cc: Julian Reschke <julian.reschke@gmx.de>, Dan Connolly <connolly@w3.org>, "public-html@w3.org WG" <public-html@w3.org>, Sam Ruby <rubys@us.ibm.com>

On 2007-08-21 09:22:23 +0200 Karl Dubost <karl@w3.org> replied:
> Sam Ruby (21 août 2007 - 00:31) :
>> ... back and forth of "I'm authoritative", "No *I'M* authoritative"  ...
> ... the issue ... mayor disconnection between production and consumption ...
> * People  _view_  content _on_      the Web
> * People _author_ content _outside_ the Web

> ... specifications or IETF would have been  wise for each format to 
> recommend an extension for files,  acknowledging that the content is often 
> being created on the desktop.  It would have helped to maintain a kind of 
> continuity between "on the  Web" and "outside the Web"
> ..xhtml <-> application/xhtml+xml

Very right. Even if most (Firefox at least)  browsers will treat an 'outside the Web' file as having content-type 'application/xhtml+xml' if it has the .XHTML extension, there is still room for much improvement: Author tools could default to .XHTML when authoring XHTML docs. The extensions could change when the doctype changes etc.

But, for really bridging the gap between off- and online (userved/served) files, charset extensions would have to be included as well! And also (human) language extensions. 

The recommended praxis is to _not_ specify the encoding using a META element, but instead have the web server tell the UA what encoding is being used. But while Web servers supporting charset extension could e.g serve 'file.html.utf8' and 'file.utf8.html' as UTF8-encoded files, it is, due to the off-line hurdles, difficult to achieve following of this praxis.

Using language extensions (e.g. 'file.html.utf8.en' or 'file.html.utf8.ru'), one would have a very simple way of defining the main language of the document. UA and authoring tool should at the very least not treat such extensions as invalid extensions! (E.g. a tip-top text-editor might, today, stop auto-recognizing files with language or charset extensions as being HTML files, unless the .HTML extension come as the last extension - 'file.ru.utf8.html' - which is, probably, less logical than 'file.html.utf8.ru'). 

But how does one work with such files off-line? Browsers might respond «unknown extension», or you will have to manualy select the correct encoding when browsing such files off-line, because the browsers do not read these extensions. The point must be that the file extension take presedence _over_ what the META elements specify. Otherwise, the authors will not experience how servers reads these extensions, and then send them to the browser, which again interprets these meta-headers as having more authority than the META element content.

There is no official list of charset extensions. (The Apache server's list of language extension, is of course a place to start.) And there isn't an official list for language extensions (used for this purpose) either. (And Apache e.g. identifies a problem for PL, which also has use as extension for Perl files.) 

Currently, I know about only one web browser which, in a beta version, supports the (Apache) charset extensions.

Later Julian Reschke replied:
> I think they do.
> XHTML: <http://tools.ietf.org/html/rfc3236#section-2>
> Template: <http://tools.ietf.org/html/rfc4288#section-4.11>

One of Karl points was probably that one actually recommend several extensions for (in this case) XHTML. By recommending only .XHTML, XHTML-files would in most cases automatically be served as 'application/xhtml+xml', and thus authors/users would experience the effects of XHTML.
leif halvard silli
Received on Tuesday, 21 August 2007 12:04:22 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:25 UTC