Re: tag name state from Jeni Tennison on 2012-03-04 (public-xml-er@w3.org from March 2012)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Sun, 4 Mar 2012 20:12:39 +0000
To: Robin Berjon <robin@berjon.com>
Cc: Noah Mendelsohn <nrm@arcanedomain.com>, "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>
Message-Id: <5F68D530-7917-4BC4-B69A-266A932C2DD9@jenitennison.com>

Robin,

On 4 Mar 2012, at 18:44, Robin Berjon wrote:
> The first and foremost use case that prompted this work was the ability to use XML in user-facing scenarios in such a manner that users are not the ones being punished for WF errors. The fact that users get a terrible experience whenever there's a WF error makes XML nothing short of a terrible format for user-facing content. It would be helpful to fix that.
> 
> But all that that requires is parsing into a DOM. It does not require the ability to serialise to XML, and it does not require compatibility with the XML DM.
> 
> That is not to say that the two latter are not important, or useful. They're actually pretty nice things to have around.  But, to say this yet again, it would be most useful to have access to the two latter *also* when the input is not XML but anything else that produces a DOM — especially if it is HTML.
> 
> The only viable manner of addressing the latter case is with a DOM to XML conversion algorithm. Assuming we have that, all that XML-ER needs to do is output a DOM, which can then be converted.
> 
> This has advantages that none of the alternatives have:
> 
>    • We already have a lot of the specification work done.
>    • It takes the "HTML at the front of an XML pipeline" case into account.
>    • It uses the DOM, which is the simplest and loosest model.
>    • It is more user(-agent)-friendly.
> 
> In general it also seems (to me) a lot closer to the sort of things that people in the HTML/XML TF or at XML Prague have indicated they were interested in doing.

I think you've argued successfully for having a defined method of taking a non-well-formed DOM and creating a well-formed DOM which can be serialised straightforwardly to XML. Is that a separate specification?

Even if we assume a defined error recovery transformation at the DOM level, it does not follow that a text-to-DOM parsing process must not perform any of the error recovery that would be performed in that transformation.

I suppose what concerns me about the two-step process is the potential loss of information (a) from the original text which could help with the DOM fix-up and (b) from the original (parsing) error recovery about where and how errors occurred. But those are just potential problems, I don't think they're hard arguments against the approach.

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com

Received on Sunday, 4 March 2012 20:12:54 UTC