Re: David's less simple example from Anne van Kesteren on 2012-02-28 (public-xml-er@w3.org from February 2012)

From: Anne van Kesteren <annevk@opera.com>
Date: Tue, 28 Feb 2012 22:54:51 +0100
To: "Jeni Tennison" <jeni@jenitennison.com>, "David Carlisle" <davidc@nag.co.uk>
Cc: "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>
Message-ID: <op.waeshptn64w2qv@annevk-macbookpro.local>

On Tue, 28 Feb 2012 21:09:31 +0100, David Carlisle <davidc@nag.co.uk>  
wrote:
>> Does that throw everything else in Anne's algorithm out somehow?
>
> Anne?

No, you can change individual character handling in each tokenizer state  
quite easily.

The question is whether divergence from HTML for tokenizing <foo<bar> is  
desirable. Is it our gut feeling that this is likely better or is there  
some data to back that up? In the end we want deterministic error  
handling. Making as little decisions as to how that should go and  
deferring to what went before us seems like a nice way out. There's still  
plenty of room for that around colon and namespace handling.

So overall I do not feel too strongly about what to do in each tokenizer  
state, but if we are going to change things around in a way that diverges  
 from HTML we might want to have a system for it (such as data).

-- 
Anne van Kesteren
http://annevankesteren.nl/

Received on Tuesday, 28 February 2012 21:55:35 UTC