Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation from Leif Halvard Silli on 2014-02-10 (xml-editor@w3.org from January to March 2014)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Mon, 10 Feb 2014 23:26:51 +0100
To: Paul Grosso <paul@paulgrosso.name>
Cc: xml-editor@w3.org
Message-ID: <20140210232651454671.99cae8d6@xn--mlform-iua.no>
Paul, 

we might agree about your point 2:

> 2.  It is certainly the case that nothing in the document
>     (e.g., the existence or non-existence of any form of
>     doctype declaration) has been defined by the XML spec
>     to indicate which processing mode to use.

I could live well with point 2 going into the spec. But I would
suggest to clarify, in the spec, that your statement means that:

A) 
   1) If failing to find DTDs to validate against, validating
      processors are not permitted to slip into non-validating 
      processing mode and they must, unless reporting is disabled,
      report such violations for e.g. HTML5 documents.
B)    Non-validating processors are not permitted to slip into
      validating processing mode based on presence of a doctype.

However, as a kind of compromise, I wonder what you think about, as a 
part 2) of of A), allowing ”double validation”:

A)
   2) As long as it is *clear* to the user that the processor is a
      validating one, validating processors *could* issue validity
      violations *and* the results of ”conformance checking” based
      on XSD or some other non-validating schema/option.

This way, a validating processor could report e.g. a HTML5 document as 
violating validity, but as conforming per (e.g.) XSD. Thus two paralell 
reports.

Leif Halvard Silli

Leif Halvard Silli, Sat, 8 Feb 2014 14:23:26 +0100:
> Hi Paul! Some comments to things in your point 3 and 4. 
> 
> But first: Much of what you say is good. But I also sense the attitude, 
> which I have seen elsewhere, that we can somehow safe ourself out of 
> various XML dilemmas by making (or appearing to make) the validation 
> mode stricter and stricter. I think, instead, we need some analysis of 
> what is going on and of whether we - any more - understand XML the way 
> it was intended.
> 
> (In reply to Paul Grosso, Thu, 06 Feb 2014 11:09:03 -0600.)
> 
> Regarding this, from your point 3:
> 
>> (Short of a tool uniquely designed to be a validator, I would expect 
>> any well-designed tool to have a "non-validating mode" and a way to 
>> put that tool into that mode regardless of anything in the document.)
> 
> Do you also expect, ”at user option”, to *decide* the mode?
> 
> Why do you exempt validators from your ’well-designed tool’ 
> expectation? After all, we have validating[1], and non-validating[2] 
> conformance checkers. Why not both kinds in one product?  The issue at 
> hand  - namely, auto-magic shifts from one parser mode to the other - 
> might then have been clarified earlier!
> 
> As I make clear below, XML presupposes that the user of a validating 
> processor knows that the tool runs a validating processor. This is not 
> as simple as it might sound, because we seem today to have forgotten 
> that XML requires validating tools to have *two* modes: A validity 
> violation mode reporting and mode were validity violation reporting is 
> disabled. The choice of mode is at user option. But when reporting is 
> disabled, then validating mode and non-validating mode, to the user, 
> becomes more or less identical.
> 
> So we should be able to expect from tool that they tell us, before 
> parsing, whether they are going to use validation mode or 
> non-validation mode! 
> 
> Another reason to have both in one product is the parsing differences 
> between validating and non-validating processing.[3] These difference 
> prevail whether or not the validating software ”at user option” has 
> been set to run with or without reporting of validity violations.[4]
> 
> Validator.w3.org has no option to disable validity violation reporting. 
> This is thus a violation of the XML 1.0 requirement that validating 
> violation reporting in validating processors should be ”at user 
> option”. Another tool that fails that test is Xmllint. Try this:
>   $ xmllint --nowarning --validate validity-violating-doc
> 
> A validating processor should be able to process this document with 
> validity violation reporting disabled:
> 
>    <foo/>
> 
> (Not having that option is a disservice to validating processors.)
> 
> In order to be able to discern “no validity violation reporting” from 
> “non-validation mode”, the user needs to know whether or not (s)he is 
> running a validating processor. This might often be simpler to know if 
> the tool at hand has only has a *single* processing mode.
> 
> I therefore don’t think that XML share the expectations that 
> well-designed software being able to operate in both processing modes. 
> That ”validation” (in the broad sense) today often happens *without* 
> DTD, supports that view.
> 
> Relating this to my issue: I did clearly have in mind validating 
> processors as such, regardless of whether the user has configured it to 
> report validity violations or not. Because, after all, disabling 
> DTD-based validity violation reporting should of course not cause the 
> tool to switch to XSD - doing that would be to *deprive* the user of 
> the choice turn validity violation on and off.
> 
> To this, from your point 4:
> 
>>     It should not be amended to
>>     make any distinction between document type declaration
>>     constraints and validity constraints, and it should not
>>     be amended to made a special case out of any particular
>>     document type declaration (e.g., an "empty DTD").
> 
> A rush to tighten a rabbit hole? It is XML - not I - who distinguish 
> certain sub features of the validity feature - who discerns between 
> valid per DTD and some validity constraints on the top of that. I have 
> not said, however, that there should be more than a single validity 
> violation reporting mode!
> 
> But we could ask: What about this document: <foo/> 
> Or what about this document: <!DOCTYPE f><oo/>
> 
> For both, Xmllint only says ”no DTD found”. A single error message. Why 
> does it not say that the validity constraint that the element type has 
> to be declared, has been broken? If all validity constraints applied 
> (for  the validity violation reporter part of the software), then there 
> would be many more messages! And it would then also be non-conformance 
> with XML not to not report them! (Since XML requires reporting of 
> validity constraints whenever the document fulfills the DTD.)
> 
> So today’s validating processors do seem to think that some documents 
> only need more than a single error message when there is no DTD. And 
> this is clearly inline with XML. Tightening that hole might be to 
> *change* XML.
> 
> At the same time, tool makers today knows that there might *still* be 
> more to be said than simply ”there is no DTD”. And it is *then* they - 
> typically silently! - make the tool shift from validating mode to 
> non-validating mode.
> 
> The shift in a tool from validating processor mode to non-validating 
> processor mode is clearly one that happens when the tool at hand comes 
> to the conclusion that validating mode is no longer any useful.
> 
> What does *that* tell us?
> 
> It tells us that, actually, the tool (and the users) perceives this as 
> a shift not from validation mode to non-validation mode, but as a shift 
> from *one* validation mode, to *another*, more useful, validation mode!
> 
> It also tells us that *something* inside the tool has at the very least 
> performed a pre-validation of the document.
> 
> [1] http://validator.w3.org/

> [2] http://validator.w3.org/nu/

> [3] http://www.w3.org/TR/REC-xml/#dt-validating

> [4] http://www.w3.org/TR/REC-xml/#dt-atuseroption

> -- 
> leif halvard silli
Received on Monday, 10 February 2014 22:27:24 UTC