Re: Entities (part of detailed review)

[CC'ing implementors@whatwg.org]

2007/8/1, Henri Sivonen:
> On Aug 1, 2007, at 15:28, Thomas Broyer wrote:
>
> > Can someone remind me why this hasn't be done with a third "Is
> > semi-colon required" column?
>
> If anything, the current table suggests a sensible implementation
> approach that works together with the parsing algorithm prose.

Just to say that I've updated Twintsam's tokenizer [1] to use an "is
missing semi-colon recoverable"-column approach. It now passes all 519
tokenizer/entities.test tests [2].
It might not be the fastest approach but given that the HtmlEntities
class is public, I find it cleaner (entities are only exposed as names
without semi-colons, the "third colon" is internal to the library and
its HTML5 tokenizing algorithm). The HtmlEntities' class internals
could eventually be refactored though to use the "two columns"
approach instead...

[1] HTML5 library for .NET 2.0 written in C#. Most of the code is
about six-month old, I'm working on it to align with the latest
current draft. It can be found at http://code.google.com/p/twintsam/
[2] http://html5lib.googlecode.com/svn/trunk/testdata/tokenizer/entities.test

-- 
Thomas Broyer

Received on Thursday, 2 August 2007 10:17:31 UTC