Re: Comments on HTML WG face to face meetings in France Oct 08 from noah_mendelsohn@us.ibm.com on 2008-11-15 (www-tag@w3.org from November 2008)

From: <noah_mendelsohn@us.ibm.com>
Date: Sat, 15 Nov 2008 11:53:33 -0500
To: Boris Zbarsky <bzbarsky@MIT.EDU>
Cc: "Henry S. Thompson" <ht@inf.ed.ac.uk>, public-html <public-html@w3.org>, www-tag@w3.org
Message-ID: <OFD75F4E6F.FDE5B8D4-ON85257501.007D506F-85257502.005CCB69@lotus.com>

Boris Zbarsky writes:

> Try loading that in your favorite browsers and seeing what happens. 
> Note that some of them display some bold text, while others do not. 
> This is because the XML specification _does_ say that this document is 
> invalid (that is not XML), but _doesn't_ say that this means you can't 
> process it and _doesn't_ specify the error handling other than saying 
> that processing of things after the error needs to be aborted.

Yes.  The XML Recommendation says what is and what isn't XML;  what a 
given piece of software, or some class of software should do when 
confronted with a document that is not XML is, for the most part, not the 
province of the XML Recommendation.  So, we can write specifications for, 
e.g. a databinding tool that accepts documents purported to be XML, and in 
the specification for such binding tools we should indicate whether they 
SHOULD/MAY/MUST/MUST NOT extract data from a document that is not XML 
after all.  We might specify different rules for some other class of 
document processing software.  Of course, insofar as off the shelf XML 
parsers tend to have as their purpose to accept only well formed XML, such 
parsers won't likely be usable in software that wants to accept other 
input.   As you point out, there are some browsers that work in such a 
flexible mode. 

To  be clear, a lot of what I've argued for is a matter of taste.  I 
prefer the layering of the XML stack, in which one document sets out what 
the correct (well formed, in the case of XML) language is, and other 
documents describe the construction of certain classes of software that 
consume XML, and sometimes also extract useful data from documents that 
are asserted to be XML, but in fact are not.  Indeed, the XML 
Recommendation probably says a bit more about processors than I would 
prefer.  Anyway, to reiterate, this is somewhat a matter of taste. Several 
people who are very knowledgeable have claimed that the HTML 5 drafts as 
written do answer the question:  what is a legal HTML 5 document and what 
is its interpretation.  I certainly believe them.  I as a new reader find 
it much harder to identify that important information in the HTML 5 drafts 
than I find it to be when reading, say, the XML Recommendation or the C++ 
Annotated Reference Manual, or the Java Language Specification, to pick 
some examples.  That's why I'm very glad to see that Michael Smith is 
experimenting with writing a document that would  be focussed specifically 
on conveying that information.

For what it's worth, if I were writing the HTML 5 drafts from scratch, and 
having to satisfy only my own tastes, I would probably have tried writing 
Michael's document first, and where possible referring to it from the 
larger specification (I.e. the one that describes error handling).  It 
could be that if I were ever to try I would find that to be impractical, 
and in any case I accept that it's most likely not practical to attempt 
such a radical refactoring at this point, if it would have had some 
advantages earlier.  Again, I very much appreciate the attention everyone 
has given to my concerns.  And again, I'm quite satisfied and willing to 
let this discussion drop if everyone would like to get on to other things. 
 I suggest that we see how Michael does with his draft, and whether it 
turns out to be a good thing, as I suspect it might.  Thank you.

Noah

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Saturday, 15 November 2008 16:54:37 UTC