W3C home > Mailing lists > Public > public-xml-er@w3.org > February 2012

Re: It's probably me..

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Tue, 28 Feb 2012 18:35:29 -0500
Message-ID: <4F4D64C1.3050906@arcanedomain.com>
To: Jirka Kosek <jirka@kosek.cz>
CC: David Lee <David.Lee@marklogic.com>, Robin Berjon <robin@berjon.com>, "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>


On 2/28/2012 9:27 AM, Jirka Kosek wrote:
> Deliverable is implementable specification which defines how to turn
> stream of content which looks like XML but can contain errors like
> mismatched tags, unquoted attributes, etc into well-formed (in XML
> sense) output.
>
> Does it makes sense to you?

Yes, modulo recent discussions of whether what is specified as output is 
actually well formed XML (as you specifically propose above), vs. some 
other tree form (DOM, etc.).

That said, the official community group description at [1] says:

"This group's purpose is the discussion of applying error recovery parsing 
methods inspired from HTML to XML. "

So, officially, all this group is supposed to do is "discuss". :-(

I suspect that official description might not have been crafted with much 
care, and I'd be glad to see it changed to include some combination of 
draft specification development and/or experimental implementation.

Even then, I think it would be a very good thing to characterize the 
intended uses a bit better. When I say uses, I don't mean examples of 
particular tricky input (though that's important too), I mean things like:

* Making XML more practical to use in browsing scenarios
* ...your favorite other use of non-well formed XML here...

Specifically, I think there are fixups that are perfectly sensible when the 
results will be formatted for review by a human, that would not be 
sufficiently reliable for automated processing. So, I think the answer to 
which fixups we want to implement will depend in part on the intended uses 
of that output.

Noah

[1] http://www.w3.org/community/xml-er/
Received on Tuesday, 28 February 2012 23:36:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 28 February 2012 23:36:03 GMT