[whatwg] Entity parsing from Sam Ruby on 2007-06-23 (public-whatwg-archive@w3.org from June 2007)

From: Sam Ruby <rubys@intertwingly.net>
Date: Sat, 23 Jun 2007 14:12:45 -0400
Message-ID: <3d4032300706231112w5f87d3acw10540d82bc6e032f@mail.gmail.com>

On 6/14/07, Ian Hickson <ian at hixie.ch> wrote:
> On Sun, 5 Nov 2006, ?istein E. Andersen wrote:
> >
> > From section 9.2.3.1. Tokenising entities:
> > >  For some entities, UAs require a semicolon, for others they don't.
> >
> > This applies to IE.
> >
> > FWIW, the entities not requiring a semicolon are the ones encoding
> > Latin-1 characters, the other HTML 3.2 entities (&amp, &gt and &lt), as
> > well as &quot and the uppercase variants (&AMP, &COPY, &GT, &LT, &QUOT
> > and &REG). [...]
>
> I've defined the parsing and conformance requirements in a way that
> matches IE. As a side-effect, this has made things like "na&iumlve"
> actually conforming. I don't know if we want this. On the one hand, it's
> pragmatic (after all, why require the semicolon?), and is equivalent to
> not requiring quotes around attribute values. On the other, people don't
> want us to make the quotes optional either.

With the latest changes to html5lib, we get a failure on a test named
test_title_body_named_charref.

Before, "A &mdash B" == "A ? B", now "A &mdash B" == "A &amp;mdash B".

Is that what we really want?  Testing with Firefox, the old behavior
is preferable.

- Sam Ruby

Received on Saturday, 23 June 2007 11:12:45 UTC