- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Wed, 22 Feb 2012 13:51:54 +0000
- To: Norman Walsh <ndw@nwalsh.com>
- Cc: W3C XML-ER Community Group <public-xml-er@w3.org>
On 21 Feb 2012, at 16:30, Norman Walsh wrote: > The things that are not XML are well defined. We get to decide what > things are not XML-ER. > > I'm not sure what the right answer is. Some things seem clearly not to > be XML-ER. For example, if I feed a JPEG image to the XML-ER parser, > it's hard to imagine any value coming from any "document" produced by > parsing that "successfully". > > OTOH, a plain text document is less clearly "not XML-ER" to me. This is > one place where a schema-agnostic parser is at a disadvantage. If you hand > > The quick brown fox > > to an HTML parser, it can manufacture a bunch of wrapper elements. > > I was just thinking about this the other day. I wonder if XML-ER > "documents" that don't have a clear root element should get one: > > <er:document xmlns:er="whateverwedecide">The quick brown fox</er:document> > I'd suggest that in cases where the input really doesn't look anything like XML (ie whose first non-whitespace character isn't a <), an XML-ER parser does whatever it is that HTML does. HTML is as good a vocabulary as any for representing such content and the rules are already defined and implemented, particularly in the key places where we expect XML-ER to be used. That would effectively limit the scope of what we have to define for XML-ER parsing, which is a good thing. The side-effect of course is that something like: I forgot my document element but I'll still have a <table><p>containing a paragraph!</p> <tr><td>just because I can</td></tr></table> would lead to all sorts of strange HTML-specific fix-up taking place, but any documents that are that badly munged are almost bound to actually be HTML anyway :) Jeni -- Jeni Tennison http://www.jenitennison.com
Received on Wednesday, 22 February 2012 13:52:21 UTC