- From: Noah Mendelsohn <nrm@arcanedomain.com>
- Date: Tue, 28 Feb 2012 18:35:29 -0500
- To: Jirka Kosek <jirka@kosek.cz>
- CC: David Lee <David.Lee@marklogic.com>, Robin Berjon <robin@berjon.com>, "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>
On 2/28/2012 9:27 AM, Jirka Kosek wrote: > Deliverable is implementable specification which defines how to turn > stream of content which looks like XML but can contain errors like > mismatched tags, unquoted attributes, etc into well-formed (in XML > sense) output. > > Does it makes sense to you? Yes, modulo recent discussions of whether what is specified as output is actually well formed XML (as you specifically propose above), vs. some other tree form (DOM, etc.). That said, the official community group description at [1] says: "This group's purpose is the discussion of applying error recovery parsing methods inspired from HTML to XML. " So, officially, all this group is supposed to do is "discuss". :-( I suspect that official description might not have been crafted with much care, and I'd be glad to see it changed to include some combination of draft specification development and/or experimental implementation. Even then, I think it would be a very good thing to characterize the intended uses a bit better. When I say uses, I don't mean examples of particular tricky input (though that's important too), I mean things like: * Making XML more practical to use in browsing scenarios * ...your favorite other use of non-well formed XML here... Specifically, I think there are fixups that are perfectly sensible when the results will be formatted for review by a human, that would not be sufficiently reliable for automated processing. So, I think the answer to which fixups we want to implement will depend in part on the intended uses of that output. Noah [1] http://www.w3.org/community/xml-er/
Received on Tuesday, 28 February 2012 23:36:01 UTC