W3C home > Mailing lists > Public > public-html@w3.org > November 2009

Re: XML namespaces on the Web

From: Innovimax W3C <innovimax+w3c@gmail.com>
Date: Thu, 19 Nov 2009 17:10:21 +0100
Message-ID: <546c6c1c0911190810g70cd8516l1920208c24b80490@mail.gmail.com>
To: Henri Sivonen <hsivonen@iki.fi>
Cc: John Cowan <cowan@ccil.org>, Lachlan Hunt <lachlan.hunt@lachy.id.au>, Liam Quin <liam@w3.org>, public-html@w3.org, public-xml-core-wg@w3.org
On Thu, Nov 19, 2009 at 1:47 PM, Henri Sivonen <hsivonen@iki.fi> wrote:
> On Nov 18, 2009, at 23:55, John Cowan wrote:
>> This turns out not to be the case: the algorithm doesn't come close to
>> XML 1.0 conformance. For example, it accepts
>>    <root less"<">
>>    </root>
>> without reporting a parse error, but this is not well-formed XML because
>> it violates a well-formedness constraint. In order to be an XML parser,
>> it has to accept what an XML parser accepts, reject what an XML parser
>> MUST reject, and report what an XML parser MUST report.
> Previously, XML advocates have been trying to explain away the Draconianness by saying that the *Application* is free to perform additional processing with the rest of the document text after the XML Processor has reported a fatal error to the Application (or that additional processing is OK if the input isn't claimed to have been XML).[1,2,3]
> Consider an XML5 Parser that's an amalgamation of an XML 1.0 Processor and an Application as follows:
> 1) XML 1.0 Processor parser part of the XML5 Parser parses until the first fatal error and reports it to the application part of the XML5 Parser.
> 2) The Application part of the XML5 parser intercepts the fatal error reported by the the XML 1.0 Processor and doesn't further echo it anywhere.
> 3) The Application part of the XML5 parser obtains the remainder of the unparsed byte stream from the XML 1.0 Processor.
> 4) The Application part of the XML5 parser obtains the internal buffers and variables of the XML 1.0 Processor.
> 5) Having initialized its own state based on the data obtained, the Application part of the XML5 parser parses the rest of the stream roughly as outlined by Anne.
> Now, let's optimize away the boundaries within the XML5 box that aren't black-box-testably distinguishable from the outside. The result: an XML5 parser that reports no errors, that parses any byte stream to completion and that black-box-testably contains a conforming XML 1.0 Processor and a pre-canned part of the Application.
> I believe this construction completely subverts the intent of the XML 1.0 spec and the vote that the group that defined XML took[4].
> Now, I'd like to ask from everyone who has argued the position that the Application may continue processing the stream after the XML 1.0 Processor has signaled a fatal error:
> * Do you believe the above construction black-box-testably constitutes an XML 1.0 Processor and (a part of) an Application? (If not, why not?)
> * Do you believe the construction subverts the intent of the XML 1.0 spec? (If not, why not?)

It does subvert because all the poor XML 1.0 processor (only) won't be
able to keep interrop wth XML 5.0 (I-can-read-everything) box. The
good point of having fatal errors is that there is no "MORE COMPLIANT
THAN" problem. Using XML 1.0 processor where obviously, you're telling
the user something else, will create such distortion. The second
obvious side effect, is that once more and more application pretends
to be what there're not (in this case XML 1.0 Processor), the user
will less and less undertand what is really compliant and what is not.

Why not introducing, then, an automatic spell checker in order to
eradicate typos ?
Why not trying to give meaning to every octet stream (by content
sniffing everything) ?

In a lot of project I worked for in the publishing world, there main
problem is exactly that : over-engineering and trying to make one step
of the process smarter than the rest of the world (and trust me they
have decades of experience in this field).

The result is always :
1) Fuzzing the responsibilities (in terms of people AND applications)
2) Generating more and more complicated "recovery mechanism" (that
nobody will succeed in teaching or explaining)
3) Fighting with recovery mechanism because people end up relying on
them in an even trickier way than anyone has ever thought of (do you
really think that you will make only one version of you XML 5.0 stuff
4) Making the same thing one layer up

In this situation, 4) will lead HTML WG to revamp HTTP, TCP, IP and
probably eletrical layer at some point

The web is what it is BECAUSE of the fact that each layer does it job
AND NOTHING MORE (even sometimes a bit less)

And why not taking the opposite way :
a) Making simpler thing and interoperable
b) Since it is simpler, you can teach it
c) When something is not compliant, then flag it (we have dozens of
services like that now that helps fighting fishing and Firefox has a
signal button) ; just find a way to make information being able to go
back to the source.

Again if all this is because of RSS, then the answer is easy : make a
special RSS parser. From the ground up RSS has never been constructed
as an XML dialect. That's why it is so hard

> [1] http://lists.w3.org/Archives/Public/public-html/2008Dec/0250.html
> [2] http://lists.w3.org/Archives/Public/www-tag/2008Dec/0048.html
> [3] http://www.balisage.net/Proceedings/vol3/html/Quin01/BalisageVol3-Quin01.html
> [4] http://lists.w3.org/Archives/Public/w3c-sgml-wg/1997May/0079.html
> --
> Henri Sivonen
> hsivonen@iki.fi
> http://hsivonen.iki.fi/

Innovimax SARL
Consulting, Training & XML Development
9, impasse des Orteaux
75020 Paris
Tel : +33 9 52 475787
Fax : +33 1 4356 1746
RCS Paris 488.018.631
SARL au capital de 10.000 
Received on Thursday, 19 November 2009 16:11:03 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:45:03 UTC