RE: ZIP-based packages and URI references into them ODF proposal

On Wed, 10 Dec 2008 wrote:
> Question:  do you believe that the specification for ASCII would best be 
> done as in implementation functional specification?  That suggests that, 
> rather than publishing, say, a table of integers and their mapping to 
> characters, it would be better to write a specification for a piece of 
> code that consumes ASCII, to explain what to do if it finds a character 
> that isn't ASCII (perhaps because it accepts 16 bit values, but 
> considers them valid only if the high order byte is 0)?  Maybe a 
> separate specification or chapters for producers of ASCII?

The way that IE and Firefox handle bytes with values greater than 0x7F 
when a file is labelled as being encoded as ASCII differs -- IE ignores 
the 8th bit, and only looks at the first seven bits, whereas Firefox 
treats bytes in the range 0x80 to 0xFF as being encoded as Windows-1252. 
This leads to security bugs, wherein the two browsers might treat the two 
strings differently (in particular, what looks like <script></script> to 
IE might look like something quite different to Firefox).

I believe the ASCII specification should have defined how to convert any 
random byte stream into characters, including bytes that aren't in the 
range 0-127. That it didn't means that every language that allows ASCII 
has to define how to handle it, which is an abstraction violation, and 
results in different specs having different rules. In many cases, the 
layers above ASCII didn't define this, and we've ended up with very real 
security problems, such as the example above.

Now in the case of ASCII doing this would be trivial -- e.g. just say that 
all bytes that aren't in the range 0x00 - 0x7F must be treated as 0x3F, 
and say that producers must not use bytes that aren't in the table. But 
yes, it should be in the ASCII spec.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 11 December 2008 01:19:57 UTC