Re: review of content type rules by IETF/HTTP community from Karl Dubost on 2007-08-21 (public-html@w3.org from August 2007)

From: Karl Dubost <karl@w3.org>
Date: Tue, 21 Aug 2007 16:22:23 +0900
To: Sam Ruby <rubys@us.ibm.com>
Cc: Julian Reschke <julian.reschke@gmx.de>, Dan Connolly <connolly@w3.org>, "public-html@w3.org WG" <public-html@w3.org>
Message-Id: <F05E0A3B-1796-4E2C-A011-B03A37B19464@w3.org>

Sam Ruby (21 août 2007 - 00:31) :
> A friend of mine once remarked that "It's almost like all the W3C  
> specs were authored by people who had root authority on the web  
> servers that they administered".

[…]

> Meanwhile, the back and forth of "I'm authoritative", "No *I'M*  
> authoritative" will never end.


Most of the issue and why it is messy now is that there has been a  
major disconnection between production and consumption of content on  
the Web.

* People  _view_  content _on_      the Web
* People _author_ content _outside_ the Web

PUT has not been widely deployed on servers. We are not in a scenario  
where most of the content is being created in the information space  
with a PUT. Most of the time, people

* create files on their desktop filesystem and send it to another  
filesystem (ftp, rsync, etc.)
* generate content from a database (or script) to a filesystem

The rare case where it could have been handled right is from dynamic  
generation of content, where the content for a specific URI is  
created on the fly. Unfortunately not many framework handle the  
notion of content-type perfectly and the notion of http headers for  
an URI (in the database or the script itself) such as a resources  
manager.

On the desktop one of the major identifier of content type is the  
file extension (or [resource fork][1] like on the mac). For example  
if I want to activate the XML parsing of an XHTML file, I usually  
give my filename an extension of .xhtml

I often tend to think that specifications or IETF would have been  
wise for each format to recommend an extension for files,  
acknowledging that the content is often being created on the desktop.  
It would have helped to maintain a kind of continuity between "on the  
Web" and "outside the Web"

.xhtml <-> application/xhtml+xml


I wonder if Ian Hickson has stats about this:

	On URIs where the file extension is readable,
	how many URIs send the bad supposed content-type?

	http://example.org/foo.xhtml  sent as text/html for example
	http://example.org/foo.jpg    sent as image/gif for example

Another interesting stats would be.

	http://example.org/foo        sent as image/gif for example
                                       when the received content
                                       is really JPG


This could help to define an algorithm for "sniffing" with priorities
http://www.w3.org/html/wg/html5/#content-type-sniffing

URIs with extension or not, etc. It creates some issues which looks  
like a file extension but which are not. I may have called a URI

	http://example.org/creating.jpg

which is in fact an HTML page talking about "creating jpg files"

	http://example.org/creating.jpg.html

This option seems doomed too.


[1]: http://en.wikipedia.org/wiki/Resource_fork



-- 
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager, QA Activity Lead
   QA Weblog - http://www.w3.org/QA/
      *** Be Strict To Be Cool ***

Received on Tuesday, 21 August 2007 07:22:33 UTC