W3C home > Mailing lists > Public > public-xml-er@w3.org > February 2012

Re: David's less simple example

From: Jeni Tennison <jeni@jenitennison.com>
Date: Tue, 28 Feb 2012 22:21:55 +0000
Cc: "David Carlisle" <davidc@nag.co.uk>, "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>
Message-Id: <824B6289-6B42-4F9E-BB2E-1B7B338AA090@jenitennison.com>
To: Anne van Kesteren <annevk@opera.com>

I suggest that we generally try to look at what other existing products do. As I said to David C., I think it would be really useful if XML-ER could cover the editing and ingesting cases as well as the browser case.

So when there is an issue we could look at what's done by:

  - HTML5
  - Oxygen
  - XMetaL
  - MarkLogic
  - XML5?
  - tagsoup?
  - htmlparse?
  - others?

with a preference for matching the behaviour of deployed products.


On 28 Feb 2012, at 21:54, Anne van Kesteren wrote:

> On Tue, 28 Feb 2012 21:09:31 +0100, David Carlisle <davidc@nag.co.uk> wrote:
>>> Does that throw everything else in Anne's algorithm out somehow?
>> Anne?
> No, you can change individual character handling in each tokenizer state quite easily.
> The question is whether divergence from HTML for tokenizing <foo<bar> is desirable. Is it our gut feeling that this is likely better or is there some data to back that up? In the end we want deterministic error handling. Making as little decisions as to how that should go and deferring to what went before us seems like a nice way out. There's still plenty of room for that around colon and namespace handling.
> So overall I do not feel too strongly about what to do in each tokenizer state, but if we are going to change things around in a way that diverges from HTML we might want to have a system for it (such as data).
> -- 
> Anne van Kesteren
> http://annevankesteren.nl/

Jeni Tennison
Received on Tuesday, 28 February 2012 22:22:23 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:47:26 UTC