- From: Noah Mendelsohn <nrm@arcanedomain.com>
- Date: Tue, 21 Feb 2012 18:28:35 -0500
- To: Norman Walsh <ndw@nwalsh.com>
- CC: W3C XML-ER Community Group <public-xml-er@w3.org>
I think there's an important difference between the way the mapping from input XML-ER to tree is documented in the spec, and how any particular implementation optimizes that. I certainly agree that nothing in the spec should >require< an implementation to produce or serialize a well-formed string of characters as an intermediate step. I continue to think that documenting the transformation from input to output in a declarative manner is preferable, for all the reasons set out in [1]; it's easier to process and generate tooling automatically, easier to generate test cases automatically, etc. Of course, the degree to which a declarative exposition is practical depends in part on the desired mappings from input to output, including so-called "fix ups", we want to do. One way, though not necessarily the best way, to document such mappings would be at the source level. For example one could easily imagine a start tag mapping that would operate at the point that other cleanup had been done (e.g. poorly nested end TAGs and missing ">" characters unscrambled), and that would map unquoted attributes to some quoted equivalent. I thing even PERL- or Ruby-grade regexp stuff is up to doing that. If we take that route, then the mappings we would document would be from non-well formed to well-formed source. The rest of the tree building would follow from existing specs, with the nice result that your choice of Infoset, XPath-DM or whatever would fall out for free. As I say, I would not expect implementations to actually produce the well-formed source or any other intermediate mapping; rather, they would implemented an optimized path from input source to output API. Still, declarative exposition is better when possible, and documenting some mappings at the source level does have some advantages. I don't think we should rule it out as an option. BTW: I think that one of the reasons HTML5 found an algorithmic exposition more practical was the need to support asynchronous scripting that operates in parallel with the parse(s). We don't have that requirement for XML-ER, I don't think (or if we do, we should state it explicitly). With XML-ER, all we've said we need is a mapping from each (potentially not-well-formed) input to a corresponding result tree. I think much or all of that can and probably should be set out declaratively. Noah [1] http://www.w3.org/2001/tag/doc/leastPower.html On 2/21/2012 5:03 PM, Norman Walsh wrote: > David Lee<David.Lee@marklogic.com> writes: >> Norm, what's your opinion on the use case of using an ER parser as a >> front-end to an existing parser. >> To me that seems the simplest and most useful case. (although almost >> certainly not the most *efficient*). > > It seems to me that by the time the ER parser has figured out how to do > the fixup, it could just generate the tree more easily than turning > it back into characters for a second parser to read. > > Be seeing you, > norm >
Received on Tuesday, 21 February 2012 23:29:08 UTC