- From: Ian Hickson <ian@hixie.ch>
- Date: Thu, 11 Dec 2008 01:19:16 +0000 (UTC)
- To: noah_mendelsohn@us.ibm.com
- Cc: Arthur Barstow <art.barstow@nokia.com>, Bill McCoy <bmccoy@adobe.com>, Carl Cargill <cargill@adobe.com>, "eduardo.gutentag@oasis-open.org" <eduardo.gutentag@oasis-open.org>, "Henry.Story@Sun.COM" <Henry.Story@sun.com>, Jon Ferraiolo <jferrai@us.ibm.com>, Marcos Caceres <marcosscaceres@gmail.com>, Larry Masinter <masinter@adobe.com>, Michael Stahl <Michael.Stahl@sun.com>, Philippe Le Hegaret <plh@w3.org>, public-webapps <public-webapps@w3.org>, Richard Cohn <rcohn@adobe.com>, Svante Schubert <Svante.Schubert@sun.com>, Stephen Zilles <szilles@adobe.com>, "www-archive@w3.org" <www-archive@w3.org>, "www-tag@w3.org" <www-tag@w3.org>, www-tag-request@w3.org
On Wed, 10 Dec 2008 noah_mendelsohn@us.ibm.com wrote: > > Question: do you believe that the specification for ASCII would best be > done as in implementation functional specification? That suggests that, > rather than publishing, say, a table of integers and their mapping to > characters, it would be better to write a specification for a piece of > code that consumes ASCII, to explain what to do if it finds a character > that isn't ASCII (perhaps because it accepts 16 bit values, but > considers them valid only if the high order byte is 0)? Maybe a > separate specification or chapters for producers of ASCII? The way that IE and Firefox handle bytes with values greater than 0x7F when a file is labelled as being encoded as ASCII differs -- IE ignores the 8th bit, and only looks at the first seven bits, whereas Firefox treats bytes in the range 0x80 to 0xFF as being encoded as Windows-1252. This leads to security bugs, wherein the two browsers might treat the two strings differently (in particular, what looks like <script></script> to IE might look like something quite different to Firefox). I believe the ASCII specification should have defined how to convert any random byte stream into characters, including bytes that aren't in the range 0-127. That it didn't means that every language that allows ASCII has to define how to handle it, which is an abstraction violation, and results in different specs having different rules. In many cases, the layers above ASCII didn't define this, and we've ended up with very real security problems, such as the example above. Now in the case of ASCII doing this would be trivial -- e.g. just say that all bytes that aren't in the range 0x00 - 0x7F must be treated as 0x3F, and say that producers must not use bytes that aren't in the table. But yes, it should be in the ASCII spec. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 11 December 2008 01:20:01 UTC