W3C home > Mailing lists > Public > public-html@w3.org > August 2007

Re: review of content type rules by IETF/HTTP community

From: Leif Halvard Silli <lhs@malform.no>
Date: Wed, 22 Aug 2007 04:26:34 +0200
Message-ID: <<<<<e1b61c39a7951c2f8e7b1a7254d77484@10013.local>>>>>
To: Robert Burns <rob@robburns.com>
Cc: Julian Reschke <julian.reschke@gmx.de>, Karl Dubost <karl@w3.org>, Dan Connolly <connolly@w3.org>, "public-html@w3.org WG" <public-html@w3.org>, Sam Ruby <rubys@us.ibm.com>

2007-08-21 14:45:30 -0500 Robert Burns:
> a registered byte sequence that mapped to a specific character
> encoding, then every text [snip]

Would we not have to have a new version of HTTP for that to work?

> I would love to see us simply recommend (as in SHOULD) UTF-8 or UTF-16
> (with authoritative BOMs) for all HTML5 documents.

Taking .HTML to mean .HTML.UTF8 -> old non-UTF8-files must be re-labeled.

>Again, encodings are very different in that the author changes
>something that the application should probably handle more opaquely
>(like through an invisible byte sequence).

Encoding is one thing, labeling another. As you know, we *do* need the encoding menu of our browsers. If the encoding is labeled through a charset extension and the browser reads those extensions, then you can use your browser to test if you have labeled it correctly.

> This is where I think you're onto something. It would be good
> for browsers to respect those extensions when opening local
> files. [...]

This will not be as important for users, as it can be for authors. (Users will seldom open such files, at all - since users will only open served files, where that extension doesn't matter - for the UA.)

>>> Finally, for languages, [...]

> Any author is free to use extensions in any way they want [...]

HTTP have allready spec'ed them
(<http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.10>).

>> [...] servers put more weight on the file extension than what is
>> written inside the file.

> See this is where I think there's an important difference. [...]

a) E.g. UTF-8 can be read as ISO-8859, b) labels can be used for testing purporses c) switching the language extension could effect how UA do language spesific things (quotes, for instance). Thus, one could very simply «fix» lack of LANG attribute and so on without altering the file content.

>In other words, they would output filename extensions for encoding and
>language to be content negotiation ready?

Do you mean: has it other purposes than in language negotiation? See above/below. Author only uses it it benefits him! But I interpret it as being in HTML5's spirit to recognise the role that MIME plays. MIME overrides DOCTYPE.

> However, there are two very different situation: 1) a UA opening a
> local file, and 2) a server receiving a request for a file from a
> remote UA. It's easy for the local UA to simply check the internal
> metadata on the file. [...]

The point was to bridge the gap between served/unserved documents. What Spotlight might know about your files, might be unrelevant - or wrong - to the server and to those you publish your documents for.

> [...] Or are you just saying you want editing UAs to assist authors
> in outputting this extracted metadata as filename extensions?

Yes, for bridging served/unserved, that is one important point.

> So are you suggesting that the filesystems and file browsers change
> so that all files of the same content with different languages get
> presented as a single file. Then 'file.html' would really point to two
> files: 'file.html.utf8.ru' and 'file.html.utf8.en'.

E.g. for the file 'file.html.utf8.ru' my proposal means that
 * even before the browser has opened it, it recognises it as: 
    of doctype HTML, in encoding UTF-8 and of Russian language 
 * ditto for text editors when they see such files 
 * and vice-versa: when I tell my authoring program that I am gonna make a file with those suffixes, then it automatically creates a file with corresponding encoding, META tags and lang attributes etc. 
* when I tell my WYSIWYG editor that I want a file with those features, then it defaults to those file extensions

> So when double clicking on (or typing in the terminal open ) [...] I'm
> just trying to understand how we would leverage what Apache does for
> local files.

I *hope* the file system sees .html as a HTML file even if it has a charset/lang extension.

>> I am intersted in capitalizing on what we allready have. [...]
>
> I'm still not clear what purpose the filename extension metadata would
> serve. When it's already there (because of an Apache installation), it
> could be used, but how?

Capitalizing in 3 steps:
* authoring tools starts to recognize the existence of these extensions, so that they do not stop recognizing files as HTML files, just because they have these extra extensions, or simply disallow using these extensions.
* offline reading and interpreting of these extensions (online they should have no meaning, except for the server)
* to use these extensions during authoring

> Also one of the points I tried to raise in my earlier response
> relates to modern filesystems. [....] 

Perhaps this is something for HTTP/2.0? 

> [...] As Sander hinted, this also means that file type settings and
> other metadata attributes can be localized  [...]

OS X uses localised names of certain specialised folders. One cannot use these names in filepaths. And if at some point one can, they will not override the "real" names.
-- 
leif halvard silli
Received on Wednesday, 22 August 2007 02:27:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:04 GMT