- From: Thomas Broyer <t.broyer@gmail.com>
- Date: Thu, 2 Aug 2007 12:17:18 +0200
- To: public-html@w3.org, "WHATWG - Implementors" <implementors@whatwg.org>
[CC'ing implementors@whatwg.org] 2007/8/1, Henri Sivonen: > On Aug 1, 2007, at 15:28, Thomas Broyer wrote: > > > Can someone remind me why this hasn't be done with a third "Is > > semi-colon required" column? > > If anything, the current table suggests a sensible implementation > approach that works together with the parsing algorithm prose. Just to say that I've updated Twintsam's tokenizer [1] to use an "is missing semi-colon recoverable"-column approach. It now passes all 519 tokenizer/entities.test tests [2]. It might not be the fastest approach but given that the HtmlEntities class is public, I find it cleaner (entities are only exposed as names without semi-colons, the "third colon" is internal to the library and its HTML5 tokenizing algorithm). The HtmlEntities' class internals could eventually be refactored though to use the "two columns" approach instead... [1] HTML5 library for .NET 2.0 written in C#. Most of the code is about six-month old, I'm working on it to align with the latest current draft. It can be found at http://code.google.com/p/twintsam/ [2] http://html5lib.googlecode.com/svn/trunk/testdata/tokenizer/entities.test -- Thomas Broyer
Received on Thursday, 2 August 2007 10:17:31 UTC