- From: Innovimax W3C <innovimax+w3c@gmail.com>
- Date: Thu, 19 Nov 2009 17:10:21 +0100
- To: Henri Sivonen <hsivonen@iki.fi>
- Cc: John Cowan <cowan@ccil.org>, Lachlan Hunt <lachlan.hunt@lachy.id.au>, Liam Quin <liam@w3.org>, public-html@w3.org, public-xml-core-wg@w3.org
On Thu, Nov 19, 2009 at 1:47 PM, Henri Sivonen <hsivonen@iki.fi> wrote: > On Nov 18, 2009, at 23:55, John Cowan wrote: > >> This turns out not to be the case: the algorithm doesn't come close to >> XML 1.0 conformance. For example, it accepts >> >> <root less"<"> >> </root> >> >> without reporting a parse error, but this is not well-formed XML because >> it violates a well-formedness constraint. In order to be an XML parser, >> it has to accept what an XML parser accepts, reject what an XML parser >> MUST reject, and report what an XML parser MUST report. > > Previously, XML advocates have been trying to explain away the Draconianness by saying that the *Application* is free to perform additional processing with the rest of the document text after the XML Processor has reported a fatal error to the Application (or that additional processing is OK if the input isn't claimed to have been XML).[1,2,3] > > Consider an XML5 Parser that's an amalgamation of an XML 1.0 Processor and an Application as follows: > 1) XML 1.0 Processor parser part of the XML5 Parser parses until the first fatal error and reports it to the application part of the XML5 Parser. > 2) The Application part of the XML5 parser intercepts the fatal error reported by the the XML 1.0 Processor and doesn't further echo it anywhere. > 3) The Application part of the XML5 parser obtains the remainder of the unparsed byte stream from the XML 1.0 Processor. > 4) The Application part of the XML5 parser obtains the internal buffers and variables of the XML 1.0 Processor. > 5) Having initialized its own state based on the data obtained, the Application part of the XML5 parser parses the rest of the stream roughly as outlined by Anne. > > Now, let's optimize away the boundaries within the XML5 box that aren't black-box-testably distinguishable from the outside. The result: an XML5 parser that reports no errors, that parses any byte stream to completion and that black-box-testably contains a conforming XML 1.0 Processor and a pre-canned part of the Application. > > I believe this construction completely subverts the intent of the XML 1.0 spec and the vote that the group that defined XML took[4]. > > Now, I'd like to ask from everyone who has argued the position that the Application may continue processing the stream after the XML 1.0 Processor has signaled a fatal error: > * Do you believe the above construction black-box-testably constitutes an XML 1.0 Processor and (a part of) an Application? (If not, why not?) > * Do you believe the construction subverts the intent of the XML 1.0 spec? (If not, why not?) It does subvert because all the poor XML 1.0 processor (only) won't be able to keep interrop wth XML 5.0 (I-can-read-everything) box. The good point of having fatal errors is that there is no "MORE COMPLIANT THAN" problem. Using XML 1.0 processor where obviously, you're telling the user something else, will create such distortion. The second obvious side effect, is that once more and more application pretends to be what there're not (in this case XML 1.0 Processor), the user will less and less undertand what is really compliant and what is not. Why not introducing, then, an automatic spell checker in order to eradicate typos ? Why not trying to give meaning to every octet stream (by content sniffing everything) ? In a lot of project I worked for in the publishing world, there main problem is exactly that : over-engineering and trying to make one step of the process smarter than the rest of the world (and trust me they have decades of experience in this field). The result is always : 1) Fuzzing the responsibilities (in terms of people AND applications) 2) Generating more and more complicated "recovery mechanism" (that nobody will succeed in teaching or explaining) 3) Fighting with recovery mechanism because people end up relying on them in an even trickier way than anyone has ever thought of (do you really think that you will make only one version of you XML 5.0 stuff ?) 4) Making the same thing one layer up In this situation, 4) will lead HTML WG to revamp HTTP, TCP, IP and probably eletrical layer at some point The web is what it is BECAUSE of the fact that each layer does it job AND NOTHING MORE (even sometimes a bit less) And why not taking the opposite way : a) Making simpler thing and interoperable b) Since it is simpler, you can teach it c) When something is not compliant, then flag it (we have dozens of services like that now that helps fighting fishing and Firefox has a signal button) ; just find a way to make information being able to go back to the source. Again if all this is because of RSS, then the answer is easy : make a special RSS parser. From the ground up RSS has never been constructed as an XML dialect. That's why it is so hard Mohamed > > [1] http://lists.w3.org/Archives/Public/public-html/2008Dec/0250.html > [2] http://lists.w3.org/Archives/Public/www-tag/2008Dec/0048.html > [3] http://www.balisage.net/Proceedings/vol3/html/Quin01/BalisageVol3-Quin01.html > [4] http://lists.w3.org/Archives/Public/w3c-sgml-wg/1997May/0079.html > > -- > Henri Sivonen > hsivonen@iki.fi > http://hsivonen.iki.fi/ > > > > > > > -- Innovimax SARL Consulting, Training & XML Development 9, impasse des Orteaux 75020 Paris Tel : +33 9 52 475787 Fax : +33 1 4356 1746 http://www.innovimax.fr RCS Paris 488.018.631 SARL au capital de 10.000 €
Received on Thursday, 19 November 2009 16:11:03 UTC