Re: David's less simple example from Jeni Tennison on 2012-02-28 (public-xml-er@w3.org from February 2012)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Tue, 28 Feb 2012 22:21:55 +0000
To: Anne van Kesteren <annevk@opera.com>
Cc: "David Carlisle" <davidc@nag.co.uk>, "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>
Message-Id: <824B6289-6B42-4F9E-BB2E-1B7B338AA090@jenitennison.com>

Anne,

I suggest that we generally try to look at what other existing products do. As I said to David C., I think it would be really useful if XML-ER could cover the editing and ingesting cases as well as the browser case.

So when there is an issue we could look at what's done by:

  - HTML5
  - Oxygen
  - XMetaL
  - MarkLogic
  - XML5?
  - tagsoup?
  - htmlparse?
  - others?

with a preference for matching the behaviour of deployed products.

Jeni

On 28 Feb 2012, at 21:54, Anne van Kesteren wrote:

> On Tue, 28 Feb 2012 21:09:31 +0100, David Carlisle <davidc@nag.co.uk> wrote:
>>> Does that throw everything else in Anne's algorithm out somehow?
>> 
>> Anne?
> 
> No, you can change individual character handling in each tokenizer state quite easily.
> 
> The question is whether divergence from HTML for tokenizing <foo<bar> is desirable. Is it our gut feeling that this is likely better or is there some data to back that up? In the end we want deterministic error handling. Making as little decisions as to how that should go and deferring to what went before us seems like a nice way out. There's still plenty of room for that around colon and namespace handling.
> 
> So overall I do not feel too strongly about what to do in each tokenizer state, but if we are going to change things around in a way that diverges from HTML we might want to have a system for it (such as data).
> 
> 
> -- 
> Anne van Kesteren
> http://annevankesteren.nl/
> 
> 

-- 
Jeni Tennison
http://www.jenitennison.com

Received on Tuesday, 28 February 2012 22:22:23 UTC