Re: XHTML validation bug (false pass)

Terje Bless wrote:
> 
> On 20.02.00 at 08:22, David Brownell <david-b@pacbell.net> wrote:
> 
> >the following isn't reported as a fatal error:
> >   Line 1:
> >   Line 2:     <?xml version="1.0"?>
> >   Lines 3-N:  irrelevant
> 
> Ok, so it's a blank line before the XML PI(?) that isn't flagged by the
> validator, but which should be?

Well, an "XML PI" is the fatal error:  grammar rule [16] explicitly precludes
them, regardless of the case used..  What's required is either:

	- XmlDecl [23], which appears only at the beginnings of
	  full documents (which may optionally have a DTD);

	- TextDecl [77], which appears only at the beginnings of
	  external parsed entities (both parameter and general).

And it turns out that the parent productions [1] [30/79] [78] are where
the "no whitespace before them" effect comes from.

For diagnostic purposes, it's clearest to always treat "<?xml" as
an XML or text decl.  Erase "XML PI" from your vocabulary.


>	 We can catch that manually if our parser
> won't do it on it's own. Is this the only place this should be taken into
> account or are there pitfalls like this anywhere else in a XML document?

See above -- TextDecl is similar.  XML decls require a version, encoding
is optional, 'standalone' is permitted.  Text decls require encoding, and
an optional version is the only other thing permitted.


> 	 XML support in the Validator is
> still labelled as Experimental.

Not from the official W3C validation service it isn't.  I just
ran it seconds ago, and there was no mention of that at all
on the resulting "XHTML brand of approval" page.


> >I've got an updated copy, which a few folk have sanity checked.
> 
> Can this be got from the usual suspects (oasis-open etc.)?

No, but I sent it along to the WG chair.  I think I'll send it to
the general list too.  You can grab:

	ftp://ftp.brownell.org/pub/xml/xmlconf-feb05.tar.gz

There's a README there explaining the status as of when I packaged
it up.  A few things came up since then (re nondeterminism in content
models, some fixes got lost).


> >>Well, in general, throwing fatal exceptions isn't really usefull
> >>behaviour for a validation tool. Is there some reason this should be
> >>changed in tis case?
> >
> >To report the error?  Absolutely -- it's telling folk that seriously
> >broken XML is valid, when it's not even well formed. [...] The XML spec
> >actually demands that you stop reporting anything except errors ...
> 
> IIRC, the XML spec sez that any well formedness error is fatal (as in dead
> stop!), but that any validity errors can be reported if normal processing
> is stopped.

See the definition of "fatal error" in the spec.  What I said is accurate:
it's permissible to report _errors_ (only) after the first fatal error.
Not just validity errors.


>		 Do you think the Validator should
> stop processing after reporting the first Well Formedness error or should
> it attempt to keep going and report as many errors as possible?

For the record, essentionally every XML parser I've worked with does the
former.  So WFness errors in documents get fixed -- all but immediately.
Which IMHO is the way to deal with them.

At the moment I've got a mild preference to do it that way.  After all,
wasn't _not_ doing that the reason we got into the HTML mess?  Not just
with a validator, but with _every_ piece of software touching HTML?

- Dave

Received on Sunday, 20 February 2000 19:31:25 UTC