Re: Comments on HTML WG face to face meetings in France Oct 08 from noah_mendelsohn@us.ibm.com on 2009-01-20 (www-tag@w3.org from January 2009)

From: <noah_mendelsohn@us.ibm.com>
Date: Tue, 20 Jan 2009 15:17:12 -0500
To: elharo@metalab.unc.edu
Cc: www-tag <www-tag@w3.org>
Message-ID: <OF000CDDDB.CFD62E03-ON85257544.006E8AD3-85257544.006F706D@lotus.com>
Elliotte Harold wrote:

> Henri Sivonen wrote:
> 
> > If you consider black box-distinguishable conformance, what's the 
> > difference between the XML parser signaling an error and handing the 
> > rest of the stream to the application which hands it to 
> another non-XML 
> > parser to continue and a parser signaling the first WF error and 
> > continuing with the rest of the stream itself?
> 
> The application knows more about what it's finding than the XML parser 
> does. The XML parser only knows XML. If you hand it something that isn't 

> XML, it knows not what to do with it.

I think it's valuable to consider some use cases that go beyond: "user 
browses to HTML page".  While there are many pros and cons to trying to 
make an XHTML, I.e. a variant of HTML that's conforming XML, one of the 
advantages is the possibility of using general purpose XML tooling on the 
same documents.  So, Henri is right I think, that if your black box is for 
the simple use case of browsing a page, there's little of any difference 
for users how the parsing and error recovery is layered internal to the 
browser. 

Consider, though, a different use case, in which some of the same XHMTL 
documents are to be stored in an XML database and their attributes and 
other data used as the subjects of queries.  Now you have in intersting 
tension.  The database will presumably deal only with well formed XML 
documents, which means that the messier content that browsers deal with 
won't work in the database, at least not in the obvious way.  On the other 
hand, the positive value of the layering becomes a bit clearer.  The XML 
specification describes the subset of the documents that will work in 
tools like the XML database.  Conforming XML parsers will accept those 
documents and reject others (though, as Elliotte points out, nothing 
prevents those parsers from handing the input text up to a browser, that 
may still decide to render it.)

So, the value of the layering is not primarily for the browsing-only 
scenario.  It's to give you the opportunity of using HTML documents with a 
lot of additional tools and in additional scenarios.  Now, whether that's 
worth designing for is a good debate, and I won't be surprised if Henri 
takes the position: no, I'd rather do without that capability and focus 
mainly on making HTML work for browsing.  I do think this is the right way 
to ask the question though.  XML may add some sorts of value to HTML in 
small ways when all you're doing is browsing, but I don't think that's the 
right "black box" to consider.  The question is: how much trouble is it 
worth to design a language that works with a wide range of existing XML 
tools.

I'm not taking a strong position on what the answer should be, as I can 
really see both sides:  I do think it's probably the right question.  (And 
yes, one can also debate whether the world would have been a better place 
if HTML error handling had been stricter from the start, and less junky 
HTML were out there, but that train has mostly left the station, I think.)

Noah


--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








Elliotte Harold <elharo@metalab.unc.edu>
Sent by: www-tag-request@w3.org
01/11/2009 06:08 PM
Please respond to elharo
 
        To:     www-tag <www-tag@w3.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        Re: Comments on HTML WG face to face meetings in 
France Oct 08



Henri Sivonen wrote:

> If you consider black box-distinguishable conformance, what's the 
> difference between the XML parser signaling an error and handing the 
> rest of the stream to the application which hands it to another non-XML 
> parser to continue and a parser signaling the first WF error and 
> continuing with the rest of the stream itself?

The application knows more about what it's finding than the XML parser 
does. The XML parser only knows XML. If you hand it something that isn't 
XML, it knows not what to do with it.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
Refactoring HTML Just Published!
http://www.amazon.com/exec/obidos/ISBN=0321503635/ref=nosim/cafeaulaitA
Received on Tuesday, 20 January 2009 20:17:57 UTC