W3C home > Mailing lists > Public > public-xml-er@w3.org > February 2012

RE: David's less simple example

From: David Lee <David.Lee@marklogic.com>
Date: Tue, 28 Feb 2012 10:55:48 -0800
To: Jeni Tennison <jeni@jenitennison.com>, David Carlisle <davidc@nag.co.uk>
CC: "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>
Message-ID: <EB42045A1F00224E93B82E949EC6675E16ADDAA842@EXCHG-BE.marklogic.com>

> I am told that, similarly, MarkLogic (and I assume other ingesters) perform
> fixup (in their case based on the DTD/schema for the XML). I know that John
> Cowan has similarly worked on similar algorithms in the past.

I'd like to comment on the above assumption about MarkLogic but probably shouldn't ... 

But ... 
I suggest that a primary reason that HTML5 and Tidy etc. can do as good a job as they do is precisely because they have the equivalent of a schema.  So they 'know' that say <br> should be <br/> and other such niceties.    I suspect that a XML version of fixup cannot do nearly as well as HTML5 without a schema. 

David Lee
Lead Engineer
MarkLogic Corporation
Phone: +1 650-287-2531
Cell:  +1 812-630-7622

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.
Received on Tuesday, 28 February 2012 18:56:18 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:47:26 UTC