RE: tag name state from David Lee on 2012-03-08 (public-xml-er@w3.org from March 2012)

From: David Lee <David.Lee@marklogic.com>
Date: Wed, 7 Mar 2012 18:42:42 -0800
To: George Cristian Bina <george@oxygenxml.com>, Jeni Tennison <jeni@jenitennison.com>
CC: Robin Berjon <robin@berjon.com>, Noah Mendelsohn <nrm@arcanedomain.com>, "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>
Message-ID: <EB42045A1F00224E93B82E949EC6675E16ADF66123@EXCHG-BE.marklogic.com>
Q: when you talk about "browser use case" are you talking about requiring the browsers code to change ? or is this something that can be implemented in JavaScript or as an addoon.
Why I ask is that if the browsers need to change I'm pretty certain this is a dead end without browser vendors being involved in this discussion.
But if its something that can plugin to existing browsers then we don't need them :)


---------------------------------it--------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
dlee@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.

-----Original Message-----
From: George Cristian Bina [mailto:george@oxygenxml.com] 
Sent: Monday, March 05, 2012 4:38 AM
To: Jeni Tennison
Cc: Robin Berjon; Noah Mendelsohn; public-xml-er@w3.org Community Group
Subject: Re: tag name state

Hi,

I think that it will be easier to get a first form finalized if we focus on the browsers usecase, and that means mainly getting from not well-formed XML to a DOM.

When we implemented the recovery for the oXygen Outline view we did not have other XML related constraints than getting a tree out of whatever it is in the editor at some point. The main constraints for us were related with our main usecase, that was: being able to react to changes in the document and reconstruct the tree very quickly, re-parsing as little as possible.

Let's take an example. During editing people will pass through invalid QName states when they want to rename a prefix:

<x:element xmlns:x="...">
   other content
</x:element>

changing "x" to "y" may get through:

(not well-formed)
<:element xmlns:x="...">
   other content
</:element>

(not namespace well-wellformed)
<y:element xmlns:x="...">
   other content
</y:element>

(well-formed)
<y:element xmlns:y="...">
   other content
</y:element>

In the oXygen Outline view the tree structure for all these cases is the same, creating nodes labeled with x:element, :element, y:element and y:element respectively. However, in our case checking that the document is well-formed or valid is a separate process that will mark the corresponding problems in the document.
But I cannot find a better (more intuitive for the user) way to represent in the Oultine view the fragment <:element xmlns:x="...">
   other content
</:element>
than by accepting ":element" as a valid label for the element name.

Similarly, there may be cases when in the recovery for creating a DOM we may find some situations when it will make more sense to accept something that contradicts the XML constraints but it is allowed in the output format that we build.

The idea that was expressed many times during that panel at XML Prague was that people are ok with using such recovery mainly at the end of a process, not as an entry point or somewhere in the middle of an XML-based system.

The main usecase that I think we should focus on now is to be able to accept as part of an XML document, data that wants to be XML from a 3rd party that you cannot control and be able to create a DOM from that, thus a browser presenting your XML document does not brake the whole page when that 3rd party data is not well-formed.

As a user, I would like to be informed that the data I am looking at was modified by error recovery - but that can be implemented in different ways, not necessary as part of this DOM building process (for example in the oXygen Outline view case we have the well-formed and validation as a separate process).

Best Regards,
George
--
George Cristian Bina
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com

On 3/4/12 10:12 PM, Jeni Tennison wrote:
> Robin,
>
> On 4 Mar 2012, at 18:44, Robin Berjon wrote:
>> The first and foremost use case that prompted this work was the ability to use XML in user-facing scenarios in such a manner that users are not the ones being punished for WF errors. The fact that users get a terrible experience whenever there's a WF error makes XML nothing short of a terrible format for user-facing content. It would be helpful to fix that.
>>
>> But all that that requires is parsing into a DOM. It does not require the ability to serialise to XML, and it does not require compatibility with the XML DM.
>>
>> That is not to say that the two latter are not important, or useful. They're actually pretty nice things to have around.  But, to say this yet again, it would be most useful to have access to the two latter *also* when the input is not XML but anything else that produces a DOM - especially if it is HTML.
>>
>> The only viable manner of addressing the latter case is with a DOM to XML conversion algorithm. Assuming we have that, all that XML-ER needs to do is output a DOM, which can then be converted.
>>
>> This has advantages that none of the alternatives have:
>>
>>     * We already have a lot of the specification work done.
>>     * It takes the "HTML at the front of an XML pipeline" case into account.
>>     * It uses the DOM, which is the simplest and loosest model.
>>     * It is more user(-agent)-friendly.
>>
>> In general it also seems (to me) a lot closer to the sort of things that people in the HTML/XML TF or at XML Prague have indicated they were interested in doing.
>
>
> I think you've argued successfully for having a defined method of taking a non-well-formed DOM and creating a well-formed DOM which can be serialised straightforwardly to XML. Is that a separate specification?
>
> Even if we assume a defined error recovery transformation at the DOM level, it does not follow that a text-to-DOM parsing process must not perform any of the error recovery that would be performed in that transformation.
>
> I suppose what concerns me about the two-step process is the potential loss of information (a) from the original text which could help with the DOM fix-up and (b) from the original (parsing) error recovery about where and how errors occurred. But those are just potential problems, I don't think they're hard arguments against the approach.
>
> Cheers,
>
> Jeni
Received on Thursday, 8 March 2012 02:43:09 UTC