- From: David Carlisle <davidc@nag.co.uk>
- Date: Thu, 23 Dec 2010 13:30:18 +0000
- To: Kurt Cagle <kurt.cagle@gmail.com>
- CC: public-html-xml@w3.org
On 23/12/2010 03:18, Kurt Cagle wrote: s to the data. Such heuristics might include the following: > > 1) If a default namespace is not defined globally but a an explicit > namespace is, and the child elements of that namespaces are in the > default namespace, then put them into the explicit namespace: > > <ns1:foo xmlns:ns="myFooNS"> > <bar/> > <bat/> > </ns1:foo> > > would map to > > <ns1:foo xmlns:ns="myFooNS"> > <ns1:bar/> > <ns1:bat/> > </ns1:foo> That's taking something that is already namespace well formed and transforming it to another document. Something for xslt not a parser. > > 2) if you have an element that repeats without being terminated between > repeats, then that element will be considered a sibling: > > <foo> > <bar>ABC > <bar>123 > </foo> > > becomes: > > <foo> > <bar>ABC</bar> > <bar>123</bar> > </foo> I'd be very worried about suggesting any such fixup in teh absence of schema driven rules. I think the only generic fixup for non well formed xml for general elements would be to close any open elements on the stack when encountering a close tag, until a matching name is found. that would match html5 foreign content parsing and xml5 and produce <foo> <bar>ABC <bar>123 </bar></bar></foo> SGML could do more as it always had a dtd to hand to specify for individual elements what the rules for. > > 3) An element with mixed content will be considered to contain that > mixed content until another element of the same name is encountered: In the absence of a schema you can't tell if it is mixed content or not > > 4) Entities would be matched to the HTML core set and converted into > their equivalent numeric entity codes. perhaps. > > And so forth. as the parser works through these cases, it assigns a > weight that indicates the likelihood that a given heuristic rule > determines the correct configuration. After the parsing is done, these > are used to calculate a confidence level for the XML document - the > likelihood that the document that is reproduced in the parsing > corresponds to the intent of the creator of this content. In the case of > well-formed XML this confidence is 1. But your first rule tool well formed content and changed it. David
Received on Thursday, 23 December 2010 13:30:47 UTC