- From: Norman Walsh <ndw@nwalsh.com>
- Date: Tue, 21 Feb 2012 11:30:39 -0500
- To: W3C XML-ER Community Group <public-xml-er@w3.org>
- Message-ID: <m239a49mj4.fsf@nwalsh.com>
Shane McCarron <shane@aptest.com> writes:
> On the other hand, I actually don't think it is a great idea to
> transform any input, regardless of how broken. Somethings are just NOT
> XML. Those things are probably NOT XML-ER either.
The things that are not XML are well defined. We get to decide what
things are not XML-ER.
I'm not sure what the right answer is. Some things seem clearly not to
be XML-ER. For example, if I feed a JPEG image to the XML-ER parser,
it's hard to imagine any value coming from any "document" produced by
parsing that "successfully".
OTOH, a plain text document is less clearly "not XML-ER" to me. This is
one place where a schema-agnostic parser is at a disadvantage. If you hand
The quick brown fox
to an HTML parser, it can manufacture a bunch of wrapper elements.
I was just thinking about this the other day. I wonder if XML-ER
"documents" that don't have a clear root element should get one:
<er:document xmlns:er="whateverwedecide">The quick brown fox</er:document>
> that is enclosed in an anonymous element node. I would prefer that an
> XML-ER parser that was handed something really broken fail predictably.
> Encouraging the parsing of stuff that is really broken is how HTML got
> so messed up in the first place.
Indeed. The two extremes: "only WF XML" and "everything" are easy to
describe. The trick will be finding the right middle ground.
Be seeing you,
norm
--
Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 413 624 6676
www.marklogic.com
Received on Tuesday, 21 February 2012 16:31:11 UTC