W3C home > Mailing lists > Public > public-xml-er@w3.org > February 2012

Re: David's less simple example

From: Jeni Tennison <jeni@jenitennison.com>
Date: Tue, 28 Feb 2012 19:27:05 +0000
Cc: David Lee <David.Lee@marklogic.com>, Derek Read <derek.read@justsystems.com>, David Carlisle <davidc@nag.co.uk>
Message-Id: <BD8CBB70-89CB-430D-8533-9B9EFEC93ECC@jenitennison.com>
To: "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>
Yes, we discussed this briefly in the lunch queue at XML Prague. I think that specifying schema-less error recovery that simply generates a tree should be the first step. It might be that it's then possible to specify a further schema-aware stage that performs better error recovery for nodes that are (somehow) marked with parse errors. Or it might be that it needs a whole other specification.


On 28 Feb 2012, at 19:02, Derek Read wrote:

> Strongly agree: "I suspect that a XML version of fixup cannot do nearly
> as well as HTML5 without a schema."
> I think if we agree on that then the spec will basically fork at this
> juncture:
> 1) When a schema is available the following assumptions and logic can be
> followed...
> 2) When a document is well-formed (no schema available) the following
> /different/ logic applies...
> Derek Read
> Program Manager, XMetaL
> -----Original Message-----
> From: David Lee [mailto:David.Lee@marklogic.com] 
> Sent: Tuesday, February 28, 2012 10:56 AM
> To: Jeni Tennison; David Carlisle
> Cc: public-xml-er@w3.org Community Group
> Subject: RE: David's less simple example
>> I am told that, similarly, MarkLogic (and I assume other ingesters)
> perform
>> fixup (in their case based on the DTD/schema for the XML). I know that
> John
>> Cowan has similarly worked on similar algorithms in the past.
> I'd like to comment on the above assumption about MarkLogic but probably
> shouldn't ... 
> But ... 
> I suggest that a primary reason that HTML5 and Tidy etc. can do as good
> a job as they do is precisely because they have the equivalent of a
> schema.  So they 'know' that say <br> should be <br/> and other such
> niceties.    I suspect that a XML version of fixup cannot do nearly as
> well as HTML5 without a schema. 
> ------------------------------------------------------------------------
> -----
> David Lee
> Lead Engineer
> MarkLogic Corporation
> dlee@marklogic.com
> Phone: +1 650-287-2531
> Cell:  +1 812-630-7622
> www.marklogic.com
> This e-mail and any accompanying attachments are confidential. The
> information is intended solely for the use of the individual to whom it
> is addressed. Any review, disclosure, copying, distribution, or use of
> this e-mail communication by others is strictly prohibited. If you are
> not the intended recipient, please notify us immediately by returning
> this message to the sender and delete all copies. Thank you for your
> cooperation.

Jeni Tennison
Received on Tuesday, 28 February 2012 19:27:30 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:47:26 UTC