RE: David's less simple example from Derek Read on 2012-02-28 (public-xml-er@w3.org from February 2012)

From: Derek Read <derek.read@justsystems.com>
Date: Tue, 28 Feb 2012 10:05:43 -0800
To: "Innovimax W3C" <innovimax+w3c@gmail.com>, "George Cristian Bina" <george@oxygenxml.com>
Cc: "David Carlisle" <davidc@nag.co.uk>, <public-xml-er@w3.org>
Message-ID: <BECDDDED92C3B949A38F5BC4BF56D21F04B20571@van-mail.jena.local>
For interest sake, given the original problem...

<math><one<two<three</one><__two></tree></math>

 

XMetaL will "fix" it as follows when opening the document in "well formed" mode (no DTD/XSD provided):

<math><one><two><three/></two></one><__two/></math>

 

It also displays the following in the "validation log" (note that each of the errors is clickable and takes you to that node so there is context here without it actually being included as text in the error):

* Bad start tag. Expected ">".

* Bad start tag. Expected ">".

* Bad start tag. Expected ">".

* Implied missing end-tag </three>

* Implied missing end-tag </two>

* Ignoring end-tag </tree>

* Implied missing end-tag </__two>

 

Note that when a DTD or XSD Schema is available the results will be different because a lot of things can be implied from the schema's rules.

 

Derek Read

Program Manager, XMetaL

 

 

From: innovimax@gmail.com [mailto:innovimax@gmail.com] On Behalf Of Innovimax W3C
Sent: Tuesday, February 28, 2012 9:14 AM
To: George Cristian Bina
Cc: David Carlisle; public-xml-er@w3.org Community Group
Subject: Re: David's less simple example

 

George,

 

That's not exactly what I got with Oxygen 13.1. How can we double check this ?

 

Mohamed

On Tue, Feb 28, 2012 at 5:33 PM, George Cristian Bina <george@oxygenxml.com> wrote:

In the oXygen Outline view the fragment

<math><one<two<three</one><two></tree></math>

will be equivalent to

<math><one><two><three></three></two></one><two></two></math>

Formatted for readability that will be:

<math>
 <one>
   <two>
     <three/>
   </two>
 </one>
 <two></two>
</math>

The </tree> tag will be actually ignored, but it still divides eventual text nodes before and after that.

Best Regards,
George
--
George Cristian Bina
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com




On 2/28/12 6:09 PM, Innovimax W3C wrote:

 David,
 
 It looks like XML5 gives a slightly different result (the name of the
 tag contains illegal "<")
 
 http://quuz.org/xml5/play?source=%3Cmath%3E%3Cone%3Ctwo%3Cthree%3C%2Fone%3E%3Ctwo%3E%3C%2Ftree%3E%3C%2Fmath%3E

 
 Mohamed
 
 On Tue, Feb 28, 2012 at 4:49 PM, David Carlisle <davidc@nag.co.uk
 <mailto:davidc@nag.co.uk>> wrote:
 
 
    I think the simple example won't really distinguish systems that "fix
    up" markup as they will all pretty much just close the stack of open
    elements and give the same result.
 
    To distinguish things a bit it's worth looking at something a bit
    less like well formed XML, say
 
    <math><one<two<three</one><__two></tree></math>
 
    Using <math> as an outer element has the advantage that you can test
    with an html5 parser (the <math> puts html5 in its "foreign content"
    xml-like mode where /> means what it is supposed to mean. One desirable
    property of XML-ER would be that it wasn't totally unlike the behaviour
    of HTML5 on such content.
 
    Using V.nu's parser you can see the result of parsing the above:
 
    http://livedom.validator.nu/?%__3C!DOCTYPE%20html%3E%0A%__3Cmath%3E%3Cone%3Ctwo%3Cthree%__3C%2Fone%3E%3Ctwo%3E%3C%__2Ftree%3E%3C%2Fmath%3E <http://livedom.validator.nu/?%25__3C!DOCTYPE%20html%3E%0A%25__3Cmath%3E%3Cone%3Ctwo%3Cthree%25__3C%2Fone%3E%3Ctwo%3E%3C%25__2Ftree%3E%3C%2Fmath%3E> 
    <http://livedom.validator.nu/?%3C!DOCTYPE%20html%3E%0A%3Cmath%3E%3Cone%3Ctwo%3Cthree%3C%2Fone%3E%3Ctwo%3E%3C%2Ftree%3E%3C%2Fmath%3E>
 
    removing the html head and body implied in the html context results in a
    parse tree of
 
    <math><__oneU00003CtwoU00003CthreeU0000__3C
    one=""><two></two></__oneU00003CtwoU00003CthreeU0000__3C></math>
 
 
    which is what it is. I don't think it matters too much what the parse
    tree is. That is, I don't think it's worth trying to argue about any
    meaning implied by the original markup. The important thing is that
    html5 specifies a deterministic algorithm that returns a tree. Unless
    there is some overwhelming objection, I think XML-ER should return the
    same tree. (To be honest I haven't checked what Anne's draft spec would
    make of this yet).
 
    David
 
    ____________________________________________________________________________

 
    The Numerical Algorithms Group Ltd is a company registered in England
    and Wales with company number 1249803. The registered office is:
    Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
 
    This e-mail has been scanned for all viruses by Star. The service is
    powered by MessageLabs.

    ____________________________________________________________________________
 
 
 
 
 --
 Innovimax SARL
 Consulting, Training & XML Development
 9, impasse des Orteaux
 75020 Paris
 Tel : +33 9 52 475787 <tel:%2B33%209%2052%20475787> 
 Fax : +33 1 4356 1746 <tel:%2B33%201%204356%201746> 
 http://www.innovimax.fr
 RCS Paris 488.018.631
 SARL au capital de 10.000 €





 

-- 
Innovimax SARL
Consulting, Training & XML Development
9, impasse des Orteaux
75020 Paris
Tel : +33 9 52 475787
Fax : +33 1 4356 1746
http://www.innovimax.fr

RCS Paris 488.018.631
SARL au capital de 10.000 €
Received on Tuesday, 28 February 2012 18:06:34 UTC