- From: <noah_mendelsohn@us.ibm.com>
- Date: Sat, 15 Nov 2008 11:53:33 -0500
- To: Boris Zbarsky <bzbarsky@MIT.EDU>
- Cc: "Henry S. Thompson" <ht@inf.ed.ac.uk>, public-html <public-html@w3.org>, www-tag@w3.org
Boris Zbarsky writes: > Try loading that in your favorite browsers and seeing what happens. > Note that some of them display some bold text, while others do not. > This is because the XML specification _does_ say that this document is > invalid (that is not XML), but _doesn't_ say that this means you can't > process it and _doesn't_ specify the error handling other than saying > that processing of things after the error needs to be aborted. Yes. The XML Recommendation says what is and what isn't XML; what a given piece of software, or some class of software should do when confronted with a document that is not XML is, for the most part, not the province of the XML Recommendation. So, we can write specifications for, e.g. a databinding tool that accepts documents purported to be XML, and in the specification for such binding tools we should indicate whether they SHOULD/MAY/MUST/MUST NOT extract data from a document that is not XML after all. We might specify different rules for some other class of document processing software. Of course, insofar as off the shelf XML parsers tend to have as their purpose to accept only well formed XML, such parsers won't likely be usable in software that wants to accept other input. As you point out, there are some browsers that work in such a flexible mode. To be clear, a lot of what I've argued for is a matter of taste. I prefer the layering of the XML stack, in which one document sets out what the correct (well formed, in the case of XML) language is, and other documents describe the construction of certain classes of software that consume XML, and sometimes also extract useful data from documents that are asserted to be XML, but in fact are not. Indeed, the XML Recommendation probably says a bit more about processors than I would prefer. Anyway, to reiterate, this is somewhat a matter of taste. Several people who are very knowledgeable have claimed that the HTML 5 drafts as written do answer the question: what is a legal HTML 5 document and what is its interpretation. I certainly believe them. I as a new reader find it much harder to identify that important information in the HTML 5 drafts than I find it to be when reading, say, the XML Recommendation or the C++ Annotated Reference Manual, or the Java Language Specification, to pick some examples. That's why I'm very glad to see that Michael Smith is experimenting with writing a document that would be focussed specifically on conveying that information. For what it's worth, if I were writing the HTML 5 drafts from scratch, and having to satisfy only my own tastes, I would probably have tried writing Michael's document first, and where possible referring to it from the larger specification (I.e. the one that describes error handling). It could be that if I were ever to try I would find that to be impractical, and in any case I accept that it's most likely not practical to attempt such a radical refactoring at this point, if it would have had some advantages earlier. Again, I very much appreciate the attention everyone has given to my concerns. And again, I'm quite satisfied and willing to let this discussion drop if everyone would like to get on to other things. I suggest that we see how Michael does with his draft, and whether it turns out to be a good thing, as I suspect it might. Thank you. Noah -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 --------------------------------------
Received on Saturday, 15 November 2008 16:54:35 UTC