- From: Robert Burns <rob@robburns.com>
- Date: Mon, 9 Jul 2007 11:50:45 -0500
- To: Andrew Sidwell <takkaria@gmail.com>
- Cc: James Graham <jg307@cam.ac.uk>, public-html@w3.org
On Jul 9, 2007, at 11:12 AM, Andrew Sidwell wrote: > Robert Burns wrote: >> On Jul 9, 2007, at 9:34 AM, James Graham wrote: >>> Robert Burns wrote: >>>> Despite some confusion on these issues, there isn't a single >>>> right way to do these things and the sooner we can acknowledge >>>> that the easier our task will be. >>> >>> If you're talking about XML parsing there really is only one way >>> to do >>> it; the DOM you get is determined by the XML spec. Any browser that >>> does something different has a bug. >> >> I've been working with primarily XML for nearly a year now (CSS >> and DOM >> and translation). And I can tell you it's not as unambiguous as you >> might think. There's definitely ambiguity and there's room to >> clear up >> ambiguity. The XML spec is most clear on well-formedness. After that, >> there's wiggle room. > > Instead of just stating "there's wiggle room", please could you give > examples of where such room exists? It's very hard to understand > any of > the issues involved based on such vague statements. Sure, sorry for the ambiguity. I've often been writing at great length on topics to have my words dismissed with a turn of phrase.. I'l try to provide a couple of examples off of the top of my head that have been changing and continue to change with XML parsing. First is the treatment of named character references (or character entity references in SGML nomenclature). Early XHTML UAs would throw up fatal errors when encountering these, just as they throw up fatal errors for ill-formed elements. I imagine this has been a significant frustration for authors trying to move seemingly well-formed code over to XML processing. Over time, Mozilla (and I think WebKit is moving in this direction too) has added support for them: basically hard-wiring its knowledge of HTML. XML makes a distinction between DTD retrieving UAs and non-DTD retrieving UAs. Most UAs do not retrieve a DTD, however, that hasn't stopped them from adding knowledge from those DTDs to the processing of XHTML. The same situation arises with WebKit's treatment of XHTML and the inferred tbody element. At some point the WebKit team decided to infer an actual tobdy element and insert it into the DOM based on its knowledge of the HTML namespace (separate from XML requirements). These are decisions UA developers have to make all the time. Sometimes it breaks interoperability. Sometimes it actually fixes interoperability. However, from our point of view, we should be willing to consider such measures and not simply dismiss them out-of- hand, because we're in a unique position to promote such measures to improve interoperability and help users, authors and UA developers alike. Do named character references belong in XHTML (i.e., are they even in the DTD)? I don't even recall off of the top of my head. However, I'm still running into tools that obliterate my Unicode characters, and so maybe its too soon to drop named character references from the HTML namespace (I know Dan reminded me they are not technically part of the HTML namespace, but that's how we tend to think of it). Should WebKit be inserting an inferred tbody element into the DOM. Not per the current spec, but since we're developing the next spec, its a possibility we shouldn't dismiss, just because its not what XHTML1 did. XML requires fatal errors on ill-formedness errors. It does not require failures on invalidity errors. Perhaps someone will cite a passage to prove me wrong, but I don't recall reading anything in XML that would prohibit a UA with hard-wired knowledge from repairing invalid text by, for example, adding in a missing tbody element. (presuming that conformance required it) I'm sure if I did a little research I could come up with some other examples. the important to keep in mind is that XML separates validity from well-formedness. It requires fatal errors on ill- formedness and not on invalidity. Certainly any DTD that includes named character references would potentially lead to ill-formedness errors for non-DTD-retrieving UAs. But there's no reason that even those UAs can't implement those named character entities through hard- wiring them (like Gecko). From what I've witnessed over the last year, the XML UAs are still figuring out what XML and XHTML conformance is. We could certainly weigh in on that: particularly regarding HTML5/XML. Take care, Rob
Received on Monday, 9 July 2007 16:50:59 UTC