W3C home > Mailing lists > Public > public-html@w3.org > August 2007

Re: Entities (part of detailed review)

From: Thomas Broyer <t.broyer@gmail.com>
Date: Thu, 2 Aug 2007 12:17:18 +0200
Message-ID: <a9699fd20708020317s6baf50aboa46e587fd3349874@mail.gmail.com>
To: public-html@w3.org, "WHATWG - Implementors" <implementors@whatwg.org>

[CC'ing implementors@whatwg.org]

2007/8/1, Henri Sivonen:
> On Aug 1, 2007, at 15:28, Thomas Broyer wrote:
> > Can someone remind me why this hasn't be done with a third "Is
> > semi-colon required" column?
> If anything, the current table suggests a sensible implementation
> approach that works together with the parsing algorithm prose.

Just to say that I've updated Twintsam's tokenizer [1] to use an "is
missing semi-colon recoverable"-column approach. It now passes all 519
tokenizer/entities.test tests [2].
It might not be the fastest approach but given that the HtmlEntities
class is public, I find it cleaner (entities are only exposed as names
without semi-colons, the "third colon" is internal to the library and
its HTML5 tokenizing algorithm). The HtmlEntities' class internals
could eventually be refactored though to use the "two columns"
approach instead...

[1] HTML5 library for .NET 2.0 written in C#. Most of the code is
about six-month old, I'm working on it to align with the latest
current draft. It can be found at http://code.google.com/p/twintsam/
[2] http://html5lib.googlecode.com/svn/trunk/testdata/tokenizer/entities.test

Thomas Broyer
Received on Thursday, 2 August 2007 10:17:31 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:25 UTC