W3C home > Mailing lists > Public > public-html@w3.org > August 2007

Re: review of content type rules by IETF/HTTP community

From: Leif Halvard Silli <lhs@malform.no>
Date: Tue, 21 Aug 2007 17:57:47 +0200
Message-ID: <b27041e3b53bdcd07058877e49a18c83@10013.local>
To: Robert Burns <rob@robburns.com>
Cc: Julian Reschke <julian.reschke@gmx.de>, Karl Dubost <karl@w3.org>, Dan Connolly <connolly@w3.org>, "public-html@w3.org WG" <public-html@w3.org>, Sam Ruby <rubys@us.ibm.com>

2007-08-21 16:26:19 +0200 Robert Burns:
> On Aug 21, 2007, at 8:56 AM, Leif Halvard Silli wrote:

>> The main thing that I agree very strongly with Karl in is that the  
>> offline 
>> and online "gap" should be bridged, and that this can  happen 
>> through 
>> setting up clear/strict recommendations for which  extensions to use 
>> - 
>> which all sides (authors, authoring software,  browsers, servers) 
>> should 
>> pay attention to. This bridging should  include official language 
>> and 
>> charset extensions, taking example  from Apache, which also allready 
>> offer 
>> its own such extensions, and  have done so for a very long time 
>> allready.

> For character encodings I think things are somewhat a mess. Most  
> authors are 
> not that aware of character encodings. To me its really  the type of 
> thing 
> authors should not have to worry about (if it had  been handled in a 
> sane way 
> form the start).

Who said that I thought the authors should worry about them - anymore 
than he cares if his applications uses .htm, .html or anything else?  
Or anymore than he cares about how the META tag for encoding 
spesfication is written (which, btw, are very hard to remember how to 
write)?

The author should not be needing to care whether his or her authoring 
application adds the charset extension or if it adds a META element 
with charset information - or do it some other way.

However, the sad thing is that **if** the author and his application 
uses a charset extension, then, in a offline mileu, the browsers are 
likely to not make any sense of the charset extension.

> Finally, for languages, its useful for servers to have metadata about 
> language at its disposal to quickly deliver to clients.

These extension are useful also for authors. It is very practical to 
discern different variants of the same file/content based upon the 
file extension. For authors, to have to look into the file is 
cumbersome.

> However, i  like the 
> way HTML handles that already through the i18N language  features. 
> Apache can 
> even be configured to sniff inside the files as  they're added to the 
> server 
> to gather this data for quick indexing  for later.

The problem which .html and .xhtml reveals is that the servers put 
more weight on the file extension than what is written inside the 
file.

Besides, one of the purposes of languge extensions is for content 
negotiation. Well, if Apache can do that  without language extensions, 
then fine, that's and extra feature (which even fewer peopler know 
about.)

> So all of these pieces of metadata each have their own place I think. 
>  The

.HTML is also a metadata.

> safest thing is to keep the authoritative data inside the file  
> itself, and 
> then extract it and index it in filesystem metadata or  elsewhere for 
> quick 
> retrieval. Many filesystems (and WebDAV too)  support extended 
> filesystem 

That extraction process is not the simple solution that Karl asked 
for. I want to save the file and test immediatly. And not wait for 
Spotlight or a big fast computer.

Besides, even Mac OS X comes with Apache. And the reason why I, on 
<MyOwnMac.local> get Apache's default index.html page in Norwegian 
instead of English, is precisely because the installed version of 
Apache has implemented filname extension based language negotiation.

> attributes. Some tools have started to  store this information there. 
> Systems 
> like Apple's Splotlight extract  authoritative metadata from files 
> and store 
> it in a sqlite database  for indexing (but also makes use of 
> filesystem 
> attributes and  extended attributes alongside the sql). To me those 
> approaches  represent best practice.  Filenames (and their 
> extensions) can be 
> too  easily and inadvertently changed: losing that metadata. The best 
>  thing 
> to do is keep it inside the file (with the exception of file  type 
> which has 
> now had a long tradition of filename extension mapping).

I am intersted in capitalizing on what we allready have. And I do not 
see these file extension problems that you see. Besides, you can put 
things both inside the file and in the file name. That is very safe, 
if the content is lost - which can also happen.
-- 
leif halvard silli
Received on Tuesday, 21 August 2007 15:58:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:04 GMT