Re: Quoted strings in "encoding sniffing algorithm" "get an attribute" from Philip Taylor on 2008-05-22 (public-html@w3.org from May 2008)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Thu, 22 May 2008 14:25:18 +0100
To: Ian Hickson <ian@hixie.ch>
CC: HTML WG <public-html@w3.org>
Message-ID: <4835743E.2090104@cam.ac.uk>

Ian Hickson wrote:
> I've [...] tried to fix something that I think was wrong 
> with another part of the algorithm, but I'm not sure that part was 
> correct. Please let me know if r1667 was completely right.

That was changing

   "7. If the byte at position is not 0x3D (ASCII '='), abort the "get 
an attribute" algorithm. Move 'position' back to the previous byte. The 
attribute's name is the value of attribute name, its value is the empty 
string."

to

   "7. If the byte at position is not 0x3D (ASCII '='), abort the "get 
an attribute" algorithm. The attribute's name is the value of attribute 
name, its value is the empty string."

Step 7 can only be reached via step 6 ("spaces"), and step 6 can only be 
reached from "[If the byte at 'position' is a space] Jump to the step 
below labeled spaces". (By the way, this reminds me why people don't 
write programs entirely with 'goto' any more.)

So, moving 'position' back in step 7 will always move it onto a space 
character. An attribute will always be returned, and so the code that 
calls "get an attribute" will always either abort or immediately call 
"get an attribute" again, and "get an attribute" will always skip over 
the leading space at 'position'.

So I think (not with extreme confidence) that the change to the spec 
never has any effect at all. (I've also tested the change in my 
implementation of this algorithm, and can't find any cases where it does 
have an effect, though they were very non-extensive tests.)

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Thursday, 22 May 2008 13:26:04 UTC