- From: <noah_mendelsohn@us.ibm.com>
- Date: Tue, 20 Jan 2009 15:17:12 -0500
- To: elharo@metalab.unc.edu
- Cc: www-tag <www-tag@w3.org>
Elliotte Harold wrote:
> Henri Sivonen wrote:
>
> > If you consider black box-distinguishable conformance, what's the
> > difference between the XML parser signaling an error and handing the
> > rest of the stream to the application which hands it to
> another non-XML
> > parser to continue and a parser signaling the first WF error and
> > continuing with the rest of the stream itself?
>
> The application knows more about what it's finding than the XML parser
> does. The XML parser only knows XML. If you hand it something that isn't
> XML, it knows not what to do with it.
I think it's valuable to consider some use cases that go beyond: "user
browses to HTML page". While there are many pros and cons to trying to
make an XHTML, I.e. a variant of HTML that's conforming XML, one of the
advantages is the possibility of using general purpose XML tooling on the
same documents. So, Henri is right I think, that if your black box is for
the simple use case of browsing a page, there's little of any difference
for users how the parsing and error recovery is layered internal to the
browser.
Consider, though, a different use case, in which some of the same XHMTL
documents are to be stored in an XML database and their attributes and
other data used as the subjects of queries. Now you have in intersting
tension. The database will presumably deal only with well formed XML
documents, which means that the messier content that browsers deal with
won't work in the database, at least not in the obvious way. On the other
hand, the positive value of the layering becomes a bit clearer. The XML
specification describes the subset of the documents that will work in
tools like the XML database. Conforming XML parsers will accept those
documents and reject others (though, as Elliotte points out, nothing
prevents those parsers from handing the input text up to a browser, that
may still decide to render it.)
So, the value of the layering is not primarily for the browsing-only
scenario. It's to give you the opportunity of using HTML documents with a
lot of additional tools and in additional scenarios. Now, whether that's
worth designing for is a good debate, and I won't be surprised if Henri
takes the position: no, I'd rather do without that capability and focus
mainly on making HTML work for browsing. I do think this is the right way
to ask the question though. XML may add some sorts of value to HTML in
small ways when all you're doing is browsing, but I don't think that's the
right "black box" to consider. The question is: how much trouble is it
worth to design a language that works with a wide range of existing XML
tools.
I'm not taking a strong position on what the answer should be, as I can
really see both sides: I do think it's probably the right question. (And
yes, one can also debate whether the world would have been a better place
if HTML error handling had been stricter from the start, and less junky
HTML were out there, but that train has mostly left the station, I think.)
Noah
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Elliotte Harold <elharo@metalab.unc.edu>
Sent by: www-tag-request@w3.org
01/11/2009 06:08 PM
Please respond to elharo
To: www-tag <www-tag@w3.org>
cc: (bcc: Noah Mendelsohn/Cambridge/IBM)
Subject: Re: Comments on HTML WG face to face meetings in
France Oct 08
Henri Sivonen wrote:
> If you consider black box-distinguishable conformance, what's the
> difference between the XML parser signaling an error and handing the
> rest of the stream to the application which hands it to another non-XML
> parser to continue and a parser signaling the first WF error and
> continuing with the rest of the stream itself?
The application knows more about what it's finding than the XML parser
does. The XML parser only knows XML. If you hand it something that isn't
XML, it knows not what to do with it.
--
Elliotte Rusty Harold elharo@metalab.unc.edu
Refactoring HTML Just Published!
http://www.amazon.com/exec/obidos/ISBN=0321503635/ref=nosim/cafeaulaitA
Received on Tuesday, 20 January 2009 20:17:57 UTC