- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Wed, 06 Sep 2006 01:04:30 +1000
- To: Dave Raggett <dsr@w3.org>
- CC: www-forms@w3.org
Dave Raggett wrote: > For text/html, IE applies special parsing rules for HTML elements > because unfortunately there are many websites with malformed markup. For > non HTML elements it honor's the /> syntax, unlike Firefox and > Opera. I don't have a Mac and hence wasn't able to test Safari. I had thought this behaviour, which only applies to elements within an <xml> element or those with namespace prefixes, was all called XML Data Islands and was using it to refer to both. It seems I was wrong about that terminology. XML Data Islands [1] is the markup that appears within an <xml> element. Whereas elements with namespace prefixes and the xmlns attributes are called Custom Tags [2]. However, that's just terminology and my statements against them still apply. From now on, I'll be referring to both collectively as IE's pseudo-XML (or equivalent). > <?xml version="1.0" encoding="utf-8"?> That triggers quirks mode in IE. > <html xmlns="http://www.w3.org/1999/xhtml" > xmlns:f="http://example.com/eforms"" xml:lang="en" lang="en"> The way IE handles the xmlns:f attribute is insane. What it does is actually remove the xmlns:f attribute from the DOM, generates an SGML PI (as opposed to an XML PI) that looks like the following and inserts it into immediately before the first usage of the prefix. <?xml:namespace prefix = f ns = "http://example.com/eforms" /> This was taken from the .innerHTML representation of your sample document [3]. However, the PI is strangely not visible in the DOM. You get exactly the same result if you insert that PI into the markup yourself immediately before the first usage of the prefix, instead of using an xmlns:f attribute. i.e. if you were to copy the innerHTML representation into a file, the result would be identical. > <bind id="b1" name="fred"/> IE treats any unknown element without a namespace prefix as an empty element. In this case, it makes no difference if you include the slash or not, it simply gets ignored. e.g. For unknown elements that aren't pseudo-XML, like this: <foo>content</foo> "FOO" and "/FOO" are both treated as distinct empty elements, rather than the start- and end-tags for the same element. The DOM looks like this: FOO #text: content /FOO > <f:model id="form">hello</f:model> That's a "Custom Tag". > <f:field ref="form/name1">Given Name</f:field> > <f:field ref="form/name2">Family Name</f:field> > > <f:submit id="submit" ref="form"/> > > This form is written in custom XML. No, it's not XML when handled by IE as text/html. XML in HTML is *undefined* and this is nothing more than a useless proprietary extension that happens to share some similarities in syntax with XML. Do not make the mistake of thinking that it is XML, there are many significant differences, particularly in relation to the well-formedness (see below) and the handling of namespaces (see above). > IE6 gives the following DOM for the body element: > > [snip - sample DOM] IE's DOM is often significantly broken. Well-formedness errors are not fatal for pseudo-XML, they're treated similarly to the way such errors are treated in HTML. With badly nested elements, it produces a DOM where a node doesn't even appear in its parent's childNodes list. It's kind of confusing, Hixie explains it on more detail [4]. > The original case is preserved if the element has a namespace, e.g. > > <h:bind id="b1" name="fred"/> > > where h has to have been bound to a namespace URI. It doesn't have to have been bound to a namespace for IE to treat it as pseudo-XML, it just has to have a prefix. Where the prefix is not defined, it just generates a PI like this: <?xml:namespace prefix = h /> > Firefox and Opera treat "/>" as if it was ">". Firefox also forces all > elements to uppercase. Opera doesn't. It seems that browser developers > aren't particularly thorough in reverse engineering IE's behavior in > parsing well formed markup. Why should they copy IE in this case? IE's pseudo-XML nonsense is not widely used or *depended upon* for anything in the real world (even though it sneaks into the garbage generated by MS Office) and is not defined anywhere. > My point is that if all browsers honored the /> syntax for non HTML > elements delivered as text/html and preserved the case, it would make it > that much easier to deploy mixed markup documents. My point is that the whole idea of embedding XML in HTML is nonsense and should have no part in any transition from HTML to XML. I'll be explaining this last point more in a future post. [1] http://msdn.microsoft.com/workshop/author/dhtml/reference/objects/xml.asp [2] http://msdn.microsoft.com/workshop/author/dhtml/overview/customtags.asp [3] http://software.hixie.ch/utilities/js/live-dom-viewer/ [4] http://ln.hixie.ch/?start=1037910467&count=1 -- Lachlan Hunt http://lachy.id.au/
Received on Tuesday, 5 September 2006 15:05:15 UTC