Re: Parsing HREFs?

On Mon, 20 Feb 2006, Lachlan Hunt wrote:

>>    ... Since ⟨ is the HTML entity for the
>>    left-pointing angle bracket, some browsers also convert &lang=en to 
>> </=en ...
>
> Although it technically should do that, I couldn't find any browser that 
> actually does.  My tests [1] show that within href attributes generally only 
> entity references from the ISO-8859-1 category, &quot, &amp, &lt and &gt from 
> the Markup Significant category, and &apos (where supported) are recognised 
> without the REFC.
>
> [1] http://lachy.id.au/dev/markup/tests/html401/charref/syntax

As far as I can see, Firefox 1.5 gets all of them right. (There might be 
problems in _displaying_ some of the characters, due to font problems, but 
that's a different issue.)

On the other hand, IE (even IE 7 beta preview) gets many of them wrong:
it fails to recognize entity references for characters outside ISO Latin 1
without REFC, it fails to recognize &apos; at all, and it fails to 
recognize hexadecimal character references without REFC.

But these errors in IE are _not_ limited to processing values of 
attributes. They are general flaws in its parser.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Monday, 20 February 2006 06:45:51 UTC