- From: Leif Halvard Silli <lhs@malform.no>
- Date: Fri, 12 Jun 2009 20:57:04 +0200
- To: Jirka Kosek <jirka@kosek.cz>
- CC: Ian Hickson <ian@hixie.ch>, public-html@w3.org
Jirka Kosek On 09-06-12 16.19: > Leif Halvard Silli wrote: > >> Thus, if a file has the name "file.html.utf8", then UAs should, when >> reading that file via the file URL protocol give precedence to the >> encoding expressed by the file suffix. >> >> Thus, I would suggest that HTML 5 a) specifies the file suffixes for all >> the encodings that it endorses > I don't think that this is good idea: > > First, on majority of systems, files ending with suffixes like .utf8 > will not be simply opened in a web browser at all. Both .utf8.html and .html.utf8 must of course work - as in Apache. > Second, filename is too fragile to convey any reasonable metadata about > content. User can easily change extension and this will affect > interpretation of encoded text. Moot: Same "problems" with the meta element. But if invisible meta data is a problem, then suffixes are safer & better. And no matter what, reality is that MIME type wins. So, it would be nice to be able to mimic the MIME type via the file:// protocol. I recommend you to test the effect of '.xhtml' and '.html' in Firefox. To this end, I provide a minimal, valid XHTML document with Cyrillic text - please save it a 8-bit Cyrillic encoding. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ru"> <head><title></title></head><body><p>алфабета</p></body></html> You will see, that with the '.html' syntax the page (probably) defaults to the 8-bit encoding that dominates in your locale. Whereas with '.xthml' the page defaults to utf-8. You will also see that if you e.g. add a "<br>", then with '.xhtml' you get the 'yellow screen of death', whereas there is no problem with that if the file suffix is '.html'. A clear demo of the difference between HTML and XHTML, and a of how little the in document meta data may matter. Is this "too fragile", as you say? It seems that file suffixes works well for HTML and XHTML documents. What I learn from this is that the encoding must be given in a cross document compatible format. And file suffixes are a such format - it works e.g. with CSS as well. > Third, there is already widely used mechanism for conveying encoding > information inside HTML body using <meta charset=..."> (in HTML5) and > <meta http-equiv="content-type" content="text/html;charset=..."> (in > "legacy" HTML). Sure this can collide with HTTP headers, but this > problem is well known and web-masters are somehow trained to cope with it. Charset suffixes is widely enough used, since long. But to bridge the gap between offline and online page serving, it would be needed that UAs support them via the file:// protocol as well. -- leif halvard silli
Received on Friday, 12 June 2009 18:57:45 UTC