- From: Harley Rosnow <Harley.Rosnow@microsoft.com>
- Date: Thu, 10 Apr 2008 15:52:42 -0700
- To: TAN Kuan Hui <tankuanhui@gmail.com>, hrhrhr hahaha <kevking@hotmail.com>, "public-html-comments@w3.org" <public-html-comments@w3.org>
I completely agree that it's best if the encoding is encapsulated in the document rather than placed in all the consuming documents. That goal can also be achieved through the use of UTF-8 and the placement of a BOM at the start of the file. Thanks, Harley Rosnow Internet Explorer Development Microsoft Corporation -----Original Message----- From: TAN Kuan Hui [mailto:tankuanhui@gmail.com] Sent: Thursday, April 10, 2008 1:03 AM To: Harley Rosnow; hrhrhr hahaha; public-html-comments@w3.org Subject: Re: charset Also on the subject of charset; <script> can be further tagged with a charset declaration when it has a linked resource. Otherwise it is assumed to be in the same encoding as the document. for example, <meta http-equiv='Content-Type' content='text/html; charset=UTF-8'><script type='text/javascript' src='http://someabc.com/scriptA.js' charset='gb2312'/> <script type='text/javascript' src='http://somexyz.com/scriptB.js' charset='big5'/> if scriptA.js inserts some html code into the document, for example, <script type='text/javascript' src='http://someabc.com/scriptAA.js'/> and fails to declare its encoding, the mashup potentially fails. The problem is that the encoding should be declared at the script itself and not on the element. (i.e. the standard should advocate a consistent approach such as those for style sheets and xml, where the declaration resides with the source and not with the consuming document). This then totally subsumes the problem to the developer who wrote the script and the encoding problem becomes "TRANSPARENT" to the consumer. I am sure this must have been deliberated in detail in the past, but with the meteoric rise of ajax, its perhaps worth a revisit. However, I agree with Harley that a multi-charset feature might be unnecessarily complicated. Thanks. ----- Original Message ----- From: "Harley Rosnow" <Harley.Rosnow@microsoft.com> To: "hrhrhr hahaha" <kevking@hotmail.com>; <public-html-comments@w3.org> Sent: Thursday, April 10, 2008 1:26 AM Subject: RE: charset Hello, Such a multi-charset feature would be very difficult for user agents to support. I work on Internet Explorer's "charset" implementation. Simply put, in the implementation of the "charset" META tag, data is (1) placed in a buffer, (2) scanned for a charset META tag in the head, (3) decoded based on that charset, (4) tokenized by a lexer and (5) the tokens are parsed into some internal representation. While user agent implementations vary and I've greatly simplified things, it's still useful to think of the processing in these terms. The difficulty of this suggestion is that the user agent wouldn't be able to identify and handle charset attributes until [I] during the parsing phase (5) or [II] by adding an early scan that violates the above phasing. If discovered during parsing, the appropriate portion of the file would have to be reset back to phase (3), decoded again, retokenized and reparsed. On the other hand, the user agent could attempt to find such attributes at a much earlier phase. There are many encodings that have problematic characteristics such as characters from the ASCII range (used for elements, attributes, etc.) not being encoded as ASCII characters in the stream or shift states that change the interpretation of bytes. As a result, these early scans are very difficult and expense (in terms of performance) to implement early. For either approach, this kind of reinterpretation of data lends itself to security attacks (I could hide script in a different encoding from the rest of the file), makes XSS filtering much more difficult, leads to bad performance issues and ultimately inconsistent implementation across the different user agents. I'd recommend against such a multi-charset feature. Servers that compose files together need make their encoding consistent in the rendered composite file. The same holds true for composition which occurs on the client. Thanks, Harley Rosnow Internet Explorer Development Microsoft Corporation -----Original Message----- From: public-html-comments-request@w3.org [mailto:public-html-comments-request@w3.org] On Behalf Of hrhrhr hahaha Sent: Wednesday, April 09, 2008 7:42 AM To: public-html-comments@w3.org Subject: charset Hi, IF a page has its charset set in the head section, via meta tag, and elsewhere, within the same page is another charset used, aside from wrapping it in an a tag/element, what would you use? Maybe, alongside lang (and xml:lang) the charset 'could' be added to span? Even using utf-8, there could be a charset used NOT in or recognised by utf, that could be 'added' to the page via inline tag/element? _________________________________________________________________ More immediate than e-mail? Get instant access with Windows Live Messenger. http://www.windowslive.com/messenger/overview.html?ocid=TXT_TAGLM_WL_Refresh_instantaccess_042008
Received on Thursday, 10 April 2008 22:54:59 UTC