Re: David's less simple example

Yes, we discussed this briefly in the lunch queue at XML Prague. I think that specifying schema-less error recovery that simply generates a tree should be the first step. It might be that it's then possible to specify a further schema-aware stage that performs better error recovery for nodes that are (somehow) marked with parse errors. Or it might be that it needs a whole other specification.

Jeni

On 28 Feb 2012, at 19:02, Derek Read wrote:

> Strongly agree: "I suspect that a XML version of fixup cannot do nearly
> as well as HTML5 without a schema."
> 
> I think if we agree on that then the spec will basically fork at this
> juncture:
> 
> 1) When a schema is available the following assumptions and logic can be
> followed...
> 2) When a document is well-formed (no schema available) the following
> /different/ logic applies...
> 
> Derek Read
> Program Manager, XMetaL
> 
> 
> -----Original Message-----
> From: David Lee [mailto:David.Lee@marklogic.com] 
> Sent: Tuesday, February 28, 2012 10:56 AM
> To: Jeni Tennison; David Carlisle
> Cc: public-xml-er@w3.org Community Group
> Subject: RE: David's less simple example
> 
> 
>> 
>> I am told that, similarly, MarkLogic (and I assume other ingesters)
> perform
>> fixup (in their case based on the DTD/schema for the XML). I know that
> John
>> Cowan has similarly worked on similar algorithms in the past.
>> 
> 
> I'd like to comment on the above assumption about MarkLogic but probably
> shouldn't ... 
> 
> But ... 
> I suggest that a primary reason that HTML5 and Tidy etc. can do as good
> a job as they do is precisely because they have the equivalent of a
> schema.  So they 'know' that say <br> should be <br/> and other such
> niceties.    I suspect that a XML version of fixup cannot do nearly as
> well as HTML5 without a schema. 
> 
> ------------------------------------------------------------------------
> -----
> David Lee
> Lead Engineer
> MarkLogic Corporation
> dlee@marklogic.com
> Phone: +1 650-287-2531
> Cell:  +1 812-630-7622
> www.marklogic.com
> 
> This e-mail and any accompanying attachments are confidential. The
> information is intended solely for the use of the individual to whom it
> is addressed. Any review, disclosure, copying, distribution, or use of
> this e-mail communication by others is strictly prohibited. If you are
> not the intended recipient, please notify us immediately by returning
> this message to the sender and delete all copies. Thank you for your
> cooperation.
> 
> 
> 
> 
> 
> 

-- 
Jeni Tennison
http://www.jenitennison.com

Received on Tuesday, 28 February 2012 19:27:30 UTC