Re: [SVGMobile12] A.7.19 parseXML from Robin Berjon on 2006-05-09 (www-svg@w3.org from May 2006)

From: Robin Berjon <robin.berjon@expway.fr>
Date: Tue, 9 May 2006 17:18:51 +0200
To: Jonathan Watt <jwatt@jwatt.org>
Cc: www-svg@w3.org
Message-Id: <66817848-96D4-444C-8418-9F5BBEE206AD@expway.fr>
Hi Jonathan,

On Jan 27, 2006, at 18:23, Jonathan Watt wrote:
> 1) There is no means to specify the MIME type the string should be  
> parsed as.
>
> Mozilla decides on the type of document object to create depending  
> on the MIME type served with markup. (I understand the same has to  
> be true for Safari if the content has no DOCTYPE declaration, which  
> will be the case for SVG 1.2.) Although I don't know this part of  
> Mozilla/Safari's code, I'm told that rearchitecting would be a  
> *huge* and messy task and the owners of this code aren't even sure  
> if the end result is desirable. If waiting until you've parsed the  
> root element is too late to create it's ownerDocument, I think the  
> burden would be better placed on content authors to change parseXML  
> in their code to a better spec'ed parsing function. That is if they  
> want a built-in parseXML any time in the next three years or more.
>
> I assume the SVG implementations that established the parseXML  
> extension were able to bypass this problem because they only dealt  
> with SVG and therefore assumed image/svg+xml.

Given that parseXML can only parse a string that is passed to it  
directly, as opposed to getting a representation off the network as  
XMLHttpRequest does, there seems to be no value in passing a media  
type to it since it can only parse XML. What would passing "text/ 
plain" do? In XMLHttpRequest this could have the effect of setting  
responseXML to null and responseText to the value, but since parseXML  
only does XML, I'm unsure what this would achieve.

It is not expected that implementations would only deal with SVG, it  
is expected however that implementations (of parseXML) would only  
deal with XML. Otherwise, the method returns null. We have therefore  
not changed the specification to add a media type.

> 2) By default, this function assumes you want to create a new  
> document.
>
> However SVG Tiny 1.2 doesn't support having more than one document  
> as far as I can tell. Certainly createDocument is not part of SVG  
> 1.2 Tiny's cut down DOMImplementation interface, and there may be  
> issues that allowing document objects to be created via this route  
> could raise.
>
> I also think the assumption that you want to create a new document  
> is false - most people will want to parse markup and insert it into  
> the *current* document, not have to specify the current document  
> explicitly or importNode() where it's available. Either that, or  
> they don't care about the ownerDocument of the new nodes so it  
> would be better to make it the current document and save the  
> overhead of creating a new document.

There is no restriction in SVG Tiny 1.2 on having multiple documents.  
You could well use parseXML to XML into as many Documents as contents  
your heart, until you run out of memory.

There is no assumption that you want to create a new Document: if you  
want a new Document, from which you indeed won't be able to copy  
nodes (but you can still do a bunch of useful stuff with it, e.g.  
parse some Atom off the network and display something based on it),  
you don't pass the second argument; if you want an Element that  
belongs to another Document (typically the SVG), then you pass that  
argument. You would do that for nodes that you intend to insert into  
that tree.

We find that both use cases are equally important, and both are  
addressed by the draft.

> 3) The function is incompatible with existing implementations anyway.
>
> When a document is passed in, current implementations return a  
> DocumentFragment. This allows the markup string to have multiple  
> sibling tags at it's root, whereas the version here has a different  
> return type that would disallow this.

The ability to parse well-balanced but not well-formed XML is an  
issue for implementation that have to rely on an external XML parser  
that oftentimes will not expose that functionality (even though it  
may support it if it supports fetching external entities, which  
mobile XML parsers often don't). Given that, returning a Node that  
can be treated as/casted to an Element was the lesser  
incompatibility, especially given that DocumentFragment is not in the  
uDOM.

> 4) Behavior for invalid markup is unspecified.
>
> When you pass invalid markup to DOMParser in Mozilla it will create  
> <parsererror> elements in place of the invalid markup. I don't know  
> if Opera and Safari do the same with their implementations of  
> DOMParser, but if they do I think it would be better to spec  
> DOMParser rather than this broken function parseXML. At least  
> specify that it throws or something.

I assume that when you say invalid you mean non-well-formed (sorry to  
be pedantic, but those terms are loaded in XML). The draft currently  
specifies that if the XML is not WF or not NSWF, parseXML returns  
null which clearly indicates that there was an error. We don't  
believe that this is an incorrect, or "broken" behaviour, and it  
seems to us to be better than returning a Document describing the  
error since it is harder to distinguish from a perfectly correct  
document. We therefore have not changed the specification in this  
regard.

Thank you kindly for your comments, please let us know shortly if  
this does not satisfy your concerns,

-- 
Robin Berjon
    Senior Research Scientist
    Expway, http://expway.com/
Received on Tuesday, 9 May 2006 15:18:51 UTC