Re: tag name state from George Cristian Bina on 2012-03-05 (public-xml-er@w3.org from March 2012)

From: George Cristian Bina <george@oxygenxml.com>
Date: Mon, 05 Mar 2012 11:37:46 +0200
To: Jeni Tennison <jeni@jenitennison.com>
CC: Robin Berjon <robin@berjon.com>, Noah Mendelsohn <nrm@arcanedomain.com>, "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>
Message-ID: <4F54896A.8050906@oxygenxml.com>
Hi,

I think that it will be easier to get a first form finalized if we focus 
on the browsers usecase, and that means mainly getting from not 
well-formed XML to a DOM.

When we implemented the recovery for the oXygen Outline view we did not 
have other XML related constraints than getting a tree out of whatever 
it is in the editor at some point. The main constraints for us were 
related with our main usecase, that was: being able to react to changes 
in the document and reconstruct the tree very quickly, re-parsing as 
little as possible.

Let's take an example. During editing people will pass through invalid 
QName states when they want to rename a prefix:

<x:element xmlns:x="...">
   other content
</x:element>

changing "x" to "y" may get through:

(not well-formed)
<:element xmlns:x="...">
   other content
</:element>

(not namespace well-wellformed)
<y:element xmlns:x="...">
   other content
</y:element>

(well-formed)
<y:element xmlns:y="...">
   other content
</y:element>

In the oXygen Outline view the tree structure for all these cases is the 
same, creating nodes labeled with x:element, :element, y:element and 
y:element respectively. However, in our case checking that the document 
is well-formed or valid is a separate process that will mark the 
corresponding problems in the document.
But I cannot find a better (more intuitive for the user) way to 
represent in the Oultine view the fragment
<:element xmlns:x="...">
   other content
</:element>
than by accepting ":element" as a valid label for the element name.

Similarly, there may be cases when in the recovery for creating a DOM we 
may find some situations when it will make more sense to accept 
something that contradicts the XML constraints but it is allowed in the 
output format that we build.

The idea that was expressed many times during that panel at XML Prague 
was that people are ok with using such recovery mainly at the end of a 
process, not as an entry point or somewhere in the middle of an 
XML-based system.

The main usecase that I think we should focus on now is to be able to 
accept as part of an XML document, data that wants to be XML from a 3rd 
party that you cannot control and be able to create a DOM from that, 
thus a browser presenting your XML document does not brake the whole 
page when that 3rd party data is not well-formed.

As a user, I would like to be informed that the data I am looking at was 
modified by error recovery - but that can be implemented in different 
ways, not necessary as part of this DOM building process (for example in 
the oXygen Outline view case we have the well-formed and validation as a 
separate process).

Best Regards,
George
--
George Cristian Bina
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

On 3/4/12 10:12 PM, Jeni Tennison wrote:
> Robin,
>
> On 4 Mar 2012, at 18:44, Robin Berjon wrote:
>> The first and foremost use case that prompted this work was the ability to use XML in user-facing scenarios in such a manner that users are not the ones being punished for WF errors. The fact that users get a terrible experience whenever there's a WF error makes XML nothing short of a terrible format for user-facing content. It would be helpful to fix that.
>>
>> But all that that requires is parsing into a DOM. It does not require the ability to serialise to XML, and it does not require compatibility with the XML DM.
>>
>> That is not to say that the two latter are not important, or useful. They're actually pretty nice things to have around.  But, to say this yet again, it would be most useful to have access to the two latter *also* when the input is not XML but anything else that produces a DOM — especially if it is HTML.
>>
>> The only viable manner of addressing the latter case is with a DOM to XML conversion algorithm. Assuming we have that, all that XML-ER needs to do is output a DOM, which can then be converted.
>>
>> This has advantages that none of the alternatives have:
>>
>>     • We already have a lot of the specification work done.
>>     • It takes the "HTML at the front of an XML pipeline" case into account.
>>     • It uses the DOM, which is the simplest and loosest model.
>>     • It is more user(-agent)-friendly.
>>
>> In general it also seems (to me) a lot closer to the sort of things that people in the HTML/XML TF or at XML Prague have indicated they were interested in doing.
>
>
> I think you've argued successfully for having a defined method of taking a non-well-formed DOM and creating a well-formed DOM which can be serialised straightforwardly to XML. Is that a separate specification?
>
> Even if we assume a defined error recovery transformation at the DOM level, it does not follow that a text-to-DOM parsing process must not perform any of the error recovery that would be performed in that transformation.
>
> I suppose what concerns me about the two-step process is the potential loss of information (a) from the original text which could help with the DOM fix-up and (b) from the original (parsing) error recovery about where and how errors occurred. But those are just potential problems, I don't think they're hard arguments against the approach.
>
> Cheers,
>
> Jeni
Received on Monday, 5 March 2012 09:38:17 UTC