W3C home > Mailing lists > Public > public-xml-er@w3.org > February 2012

RE: David's less simple example

From: Derek Read <derek.read@justsystems.com>
Date: Tue, 28 Feb 2012 11:02:11 -0800
Message-ID: <BECDDDED92C3B949A38F5BC4BF56D21F04B205A4@van-mail.jena.local>
To: "David Lee" <David.Lee@marklogic.com>, "Jeni Tennison" <jeni@jenitennison.com>, "David Carlisle" <davidc@nag.co.uk>
Cc: <public-xml-er@w3.org>
Strongly agree: "I suspect that a XML version of fixup cannot do nearly
as well as HTML5 without a schema."

I think if we agree on that then the spec will basically fork at this

1) When a schema is available the following assumptions and logic can be
2) When a document is well-formed (no schema available) the following
/different/ logic applies...

Derek Read
Program Manager, XMetaL

-----Original Message-----
From: David Lee [mailto:David.Lee@marklogic.com] 
Sent: Tuesday, February 28, 2012 10:56 AM
To: Jeni Tennison; David Carlisle
Cc: public-xml-er@w3.org Community Group
Subject: RE: David's less simple example

> I am told that, similarly, MarkLogic (and I assume other ingesters)
> fixup (in their case based on the DTD/schema for the XML). I know that
> Cowan has similarly worked on similar algorithms in the past.

I'd like to comment on the above assumption about MarkLogic but probably
shouldn't ... 

But ... 
I suggest that a primary reason that HTML5 and Tidy etc. can do as good
a job as they do is precisely because they have the equivalent of a
schema.  So they 'know' that say <br> should be <br/> and other such
niceties.    I suspect that a XML version of fixup cannot do nearly as
well as HTML5 without a schema. 

David Lee
Lead Engineer
MarkLogic Corporation
Phone: +1 650-287-2531
Cell:  +1 812-630-7622

This e-mail and any accompanying attachments are confidential. The
information is intended solely for the use of the individual to whom it
is addressed. Any review, disclosure, copying, distribution, or use of
this e-mail communication by others is strictly prohibited. If you are
not the intended recipient, please notify us immediately by returning
this message to the sender and delete all copies. Thank you for your
Received on Tuesday, 28 February 2012 19:03:00 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:47:26 UTC