- From: Robert Burns <rob@robburns.com>
- Date: Tue, 21 Aug 2007 09:26:19 -0500
- To: Leif Halvard Silli <lhs@malform.no>
- Cc: Julian Reschke <julian.reschke@gmx.de>, Karl Dubost <karl@w3.org>, Dan Connolly <connolly@w3.org>, "public-html@w3.org WG" <public-html@w3.org>, Sam Ruby <rubys@us.ibm.com>
Hi Leif, On Aug 21, 2007, at 8:56 AM, Leif Halvard Silli wrote: > > 2007-08-21 14:21:01 +0200 Julian Reschke <julian.reschke@gmx.de>: > >> Leif Halvard Silli wrote: >>> ... >>> Later Julian Reschke replied: >>>> I think they do. >>>> XHTML: <http://tools.ietf.org/html/rfc3236#section-2> >>>> Template: <http://tools.ietf.org/html/rfc4288#section-4.11> >>> One of Karl points was probably that one actually recommend >>> several extensions for (in this case) XHTML. By recommending >>> only .XHTML, XHTML-files would in most cases automatically be >>> served as 'application/xhtml+xml', and thus authors/users would >>> experience the effects of XHTML. >> RFC3236 mentions XHTML, XHT and HTML. > > Like I said. > >> Apache 2.2.x comes with a preconfigured mapping file (mime.types) >> which has >> application/xhtml+xml xhtml xht >> so as far as I can tell, it already does what you're looking for >> (and probably has for a long time). > > I am aware of this. And allthough there are more web servers than > Apache, and more browsers than Firefox, this might serve (sic) as > an example. (By asking Ian for examples of files.XHTML being served > as text/html, I suspect he expects to hear that there are very > _few_ such examples. In contrast, Ian has often been keen to > demonstrate that things doesn't work, e.g. showing how images being > served as text, will still being treated as an image by > browsers ... and other such things.) > > The main thing that I agree very strongly with Karl in is that the > offline and online "gap" should be bridged, and that this can > happen through setting up clear/strict recommendations for which > extensions to use - which all sides (authors, authoring software, > browsers, servers) should pay attention to. This bridging should > include official language and charset extensions, taking example > from Apache, which also allready offer its own such extensions, and > have done so for a very long time allready. I'm not so sure I would characterize this as a problem between the online and offline worlds. The mappings of filename extensions to MIME types are already quite common in both worlds. The problem arises with mis-configured servers or non-configured servers for new MIME types and new file extensions. As I understand it it also comes from servers trying to send default MIME types for files it's not sure about (instead of just admitting it doesn't know). For character encodings I think things are somewhat a mess. Most authors are not that aware of character encodings. To me its really the type of thing authors should not have to worry about (if it had been handled in a sane way form the start). Adding filename extensions for encoding could be one approach (as a longtime Mac user, it doesn't really appeal to me too much, but we did make the adjustment to filename extensions for file types). However, I think Unicode has really introduced a better approach with, well, Unicode itself. But also the introduction of the Byte-order-mark, that does a fairly good job of identifying UTF-8 and UTF-16 encodings as those encodings. A logical extension off this (outside our scope) would be some sort off byte registry for character encodings. Each character encoding could have its own one or two-byte sequence that each file started with. Once text editors had been updated to handle these registered bytes, authors would never have to think about it again. Every text file would always have its encoding tattooed on its forehead. Finally, for languages, its useful for servers to have metadata about language at its disposal to quickly deliver to clients. However, i like the way HTML handles that already through the i18N language features. Apache can even be configured to sniff inside the files as they're added to the server to gather this data for quick indexing for later. So all of these pieces of metadata each have their own place I think. The safest thing is to keep the authoritative data inside the file itself, and then extract it and index it in filesystem metadata or elsewhere for quick retrieval. Many filesystems (and WebDAV too) support extended filesystem attributes. Some tools have started to store this information there. Systems like Apple's Splotlight extract authoritative metadata from files and store it in a sqlite database for indexing (but also makes use of filesystem attributes and extended attributes alongside the sql). To me those approaches represent best practice. Filenames (and their extensions) can be too easily and inadvertently changed: losing that metadata. The best thing to do is keep it inside the file (with the exception of file type which has now had a long tradition of filename extension mapping). Take care, Rob
Received on Tuesday, 21 August 2007 14:26:39 UTC