RE: Subset Data Model

Thanks for the explanation ... I ask a follow-up

> John Cowan
> In the case of MicroXML, keeping the syntax a subset means that in order
> to parse MicroXML and return an XML data model, you just use an XML
> parser, which already exists.  It would also be easy to write a parser
> that accepted XML and returned the MicroXML data model: for example,
> a SAX handler that provided MicroLark events to a MicroLark handler,
> dropping all other events on the floor, or a MicroLark tree builder that
> walked an XML DOM. 

So: If the XML and uXML data models are not intended to be subsets or interchangeable ... where do you perceive the value of the syntax being a subset ?  Is this purely a familiarity issue (for example xmlsh syntax is largely compatible with bash - for familiarity reasons only)

At a worse case, a user receiving a document needs to know what the document format is in order to get the intended meaning from it.
I would think at a bare minimum a processor (and human) should have a way of distinguishing an XML and uXML document so they are not confused which data model is encoded and hence which tools can be used to extract the intended data representation.   How is this intended to be conveyed ? 
Out of band (through say the Content-Type of HTTP ...) ? a marker in the document itself ?   file extension ?

-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
dlee@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.

Received on Monday, 13 August 2012 15:38:07 UTC