Re: ZIP-based packages and URI references into them ODF proposal from Julian Reschke on 2008-12-29 (public-webapps@w3.org from October to December 2008)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Mon, 29 Dec 2008 13:17:04 +0100
To: Ian Hickson <ian@hixie.ch>
CC: noah_mendelsohn@us.ibm.com, Arthur Barstow <art.barstow@nokia.com>, Bill McCoy <bmccoy@adobe.com>, Carl Cargill <cargill@adobe.com>, "eduardo.gutentag@oasis-open.org" <eduardo.gutentag@oasis-open.org>, "Henry.Story@Sun.COM" <Henry.Story@sun.com>, Jon Ferraiolo <jferrai@us.ibm.com>, Marcos Caceres <marcosscaceres@gmail.com>, Larry Masinter <masinter@adobe.com>, Michael Stahl <Michael.Stahl@sun.com>, Philippe Le Hegaret <plh@w3.org>, public-webapps <public-webapps@w3.org>, Richard Cohn <rcohn@adobe.com>, Svante Schubert <Svante.Schubert@sun.com>, Stephen Zilles <szilles@adobe.com>, "www-archive@w3.org" <www-archive@w3.org>, "www-tag@w3.org" <www-tag@w3.org>, www-tag-request@w3.org
Message-ID: <4958BFC0.3050809@gmx.de>

Ian Hickson wrote:
> The way that IE and Firefox handle bytes with values greater than 0x7F 
> when a file is labelled as being encoded as ASCII differs -- IE ignores 
> the 8th bit, and only looks at the first seven bits, whereas Firefox 
> treats bytes in the range 0x80 to 0xFF as being encoded as Windows-1252. 
> This leads to security bugs, wherein the two browsers might treat the two 
> strings differently (in particular, what looks like <script></script> to 
> IE might look like something quite different to Firefox).
> 
> I believe the ASCII specification should have defined how to convert any 
> random byte stream into characters, including bytes that aren't in the 
> range 0-127. That it didn't means that every language that allows ASCII 
> has to define how to handle it, which is an abstraction violation, and 
> results in different specs having different rules. In many cases, the 
> layers above ASCII didn't define this, and we've ended up with very real 
> security problems, such as the example above.
> 
> Now in the case of ASCII doing this would be trivial -- e.g. just say that 
> all bytes that aren't in the range 0x00 - 0x7F must be treated as 0x3F, 
> and say that producers must not use bytes that aren't in the table. But 
> yes, it should be in the ASCII spec.

Your assumption seems to be that there's a single "good" way to define 
this error handling. I disagree with that.

For instance, for XML, sending non-ASCII characters when the declared 
encoding is US-ASCII is a fatal error, and I definitively want to stay 
it that way.

BR, Julian

Received on Monday, 29 December 2008 12:17:51 UTC