- From: Norman Walsh <ndw@nwalsh.com>
- Date: Tue, 21 Feb 2012 11:30:39 -0500
- To: W3C XML-ER Community Group <public-xml-er@w3.org>
- Message-ID: <m239a49mj4.fsf@nwalsh.com>
Shane McCarron <shane@aptest.com> writes: > On the other hand, I actually don't think it is a great idea to > transform any input, regardless of how broken. Somethings are just NOT > XML. Those things are probably NOT XML-ER either. The things that are not XML are well defined. We get to decide what things are not XML-ER. I'm not sure what the right answer is. Some things seem clearly not to be XML-ER. For example, if I feed a JPEG image to the XML-ER parser, it's hard to imagine any value coming from any "document" produced by parsing that "successfully". OTOH, a plain text document is less clearly "not XML-ER" to me. This is one place where a schema-agnostic parser is at a disadvantage. If you hand The quick brown fox to an HTML parser, it can manufacture a bunch of wrapper elements. I was just thinking about this the other day. I wonder if XML-ER "documents" that don't have a clear root element should get one: <er:document xmlns:er="whateverwedecide">The quick brown fox</er:document> > that is enclosed in an anonymous element node. I would prefer that an > XML-ER parser that was handed something really broken fail predictably. > Encouraging the parsing of stuff that is really broken is how HTML got > so messed up in the first place. Indeed. The two extremes: "only WF XML" and "everything" are easy to describe. The trick will be finding the right middle ground. Be seeing you, norm -- Norman Walsh Lead Engineer MarkLogic Corporation Phone: +1 413 624 6676 www.marklogic.com
Received on Tuesday, 21 February 2012 16:31:11 UTC