W3C home > Mailing lists > Public > whatwg@whatwg.org > June 2007

[whatwg] Entity parsing

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 22 Jun 2007 04:08:49 +0000 (UTC)
Message-ID: <Pine.LNX.4.64.0706220349350.31033@dhalsim.dreamhost.com>
On Thu, 14 Jun 2007, Michel Fortin wrote:
> Le 2007-06-14 ? 21:05, Ian Hickson a ?crit :
> 
> > I've defined the parsing and conformance requirements in a way that 
> > matches IE. As a side-effect, this has made things like "na&iumlve" 
> > actually conforming. I don't know if we want this.
> 
> I'd make it non-conforming for the sake of readability.

Done.


On Fri, 15 Jun 2007, Simon Pieters wrote:
>
> Firefox, Opera and Safari treat "na&iumlve" as equivalent to 
> "na&amp;iumlve". So for compat with them, the semicolon should be made 
> required.

Agreed.


On Fri, 15 Jun 2007, K?i?tof ?elechovski wrote:
>
> Aside: I know that it can be changed but "iuml" is a very unfortunate 
> name for "i tr?ma".  How about deprecating "iuml" in favor of "itrema"?

We're not deprecating anything, and just introducing a new name for i-uml 
would be a dangerous slippery slope to start down. Anyway, i-umlaut is 
fine, and easier to spell than i-diaeresis; why would you call "itrema"? 
Trema doesn't seem any more common than "umlaut"...


On Fri, 15 Jun 2007, Kornel Lesinski wrote:
> > 
> > I've defined the parsing and conformance requirements in a way that 
> > matches IE. As a side-effect, this has made things like "na&iumlve" 
> > actually conforming. I don't know if we want this.
> 
> Rather not. This would break unencoded URLs:
> 
> ?foo=bar&region=baz ??? ?foo=bar??ion=baz

On Fri, 15 Jun 2007, Anne van Kesteren wrote:
> 
> You mean that Internet Explorer breaks them already? That doesn't make 
> much sense to me.

On Fri, 15 Jun 2007, Kornel Lesinski wrote:
> 
> No, IE doesn't break them, and that's the point.
> 
> Section 8.2.3.1. states "This definition is used when parsing entities 
> in text and in attributes." - if I understand this correctly, this makes 
> semicolon optional for entities in both attributes and text and 
> "&region" in attribute would be interpreted as "??ion".
>
> If that's the case, it is not compatible with IE, because it parses 
> entities differently in attributes and text. Semicolon (or any 
> non-alphanumeric character actually) is required in attributes, but in 
> text it is not.
> 
> In IE6 <a href="&region">&region</a> is equivalent to <a 
> href="&amp;region">??ion</a>

On Sat, 16 Jun 2007, Anne van Kesteren wrote:
> 
> Awesome. Guess we have to reverse engineer that too then...

On Mon, 18 Jun 2007, Simon Pieters wrote:
> 
> Entity parsing works the same in different attributes (tested <img alt> and <a
> href>).
> 
> Any character that is not in the range [a-zA-Z0-9] ends an entity -- i.e., the
> following are equivalent:
> 
>   <img alt="&AElig.">
>   <img alt="&AElig;.">
> 
> ...and the following are equivalent:
> 
>   <img alt="&AElig1">
>   <img alt="&amp;AElig1">

Fixed. Sigh.


> This means that the semi-colon is not part of the entity name, and we 
> need to revert to the old entity table and instead have a third column 
> that says which entities always require a semi-colon.

Actually no, some of the entities, even in an attribute, require a 
semicolon. Compare, for instance, these:

   <span title="&DaggerA">  <span title="&degA">
   <span title="&Dagger@">  <span title="&deg@">
   <span title="&Dagger;">  <span title="&deg;">
                &DaggerA                 &degA
                &Dagger@                 &deg@
                &Dagger;                 &deg;

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 21 June 2007 21:08:49 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:58:56 UTC