Re: Draft from Norman Walsh on 2012-02-21 (public-xml-er@w3.org from February 2012)

From: Norman Walsh <ndw@nwalsh.com>
Date: Tue, 21 Feb 2012 11:30:39 -0500
To: W3C XML-ER Community Group <public-xml-er@w3.org>
Message-ID: <m239a49mj4.fsf@nwalsh.com>

Shane McCarron <shane@aptest.com> writes:
> On the other hand, I actually don't think it is a great idea to 
> transform any input, regardless of how broken.  Somethings are just NOT 
> XML.  Those things are probably NOT XML-ER either.

The things that are not XML are well defined. We get to decide what
things are not XML-ER.

I'm not sure what the right answer is. Some things seem clearly not to
be XML-ER. For example, if I feed a JPEG image to the XML-ER parser,
it's hard to imagine any value coming from any "document" produced by
parsing that "successfully".

OTOH, a plain text document is less clearly "not XML-ER" to me. This is
one place where a schema-agnostic parser is at a disadvantage. If you hand

  The quick brown fox

to an HTML parser, it can manufacture a bunch of wrapper elements.

I was just thinking about this the other day. I wonder if XML-ER
"documents" that don't have a clear root element should get one:

  <er:document xmlns:er="whateverwedecide">The quick brown fox</er:document>

> that is enclosed in an anonymous element node.  I would prefer that an 
> XML-ER parser that was handed something really broken fail predictably.  
> Encouraging the parsing of stuff that is really broken is how HTML got 
> so messed up in the first place.

Indeed. The two extremes: "only WF XML" and "everything" are easy to
describe. The trick will be finding the right middle ground.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 413 624 6676
www.marklogic.com

Received on Tuesday, 21 February 2012 16:31:11 UTC