Re: On Henry's comment about documents with DOCTYPE but without markup declaration

Summary:

Henry asks:

> But I'm curious what the original authors thought they were asking
> for a parser to do, when invoked in validating mode, on a
> well-formed document with no document type definition.

Memory is tricky, but my recollection is that the original authors of
the spec (among whom I would count not just the three people cc'd on
Henry's message but all members of the WG and ERB) were trying as hard
as we could not to define *processing*, but to define a declarative
data format suitable for various kinds of processing.

The authors perhaps had expectations about what behavior would be
desirable in the situation you describe -- but since almost none of us
had ever worked with systems of descriptive markup in which DTDs were
optional, I don't know how useful those would have been.  I guess that
if someone had asked, I would have wanted a validating processor to
behave pretty much like an SGML processor in that situation.  But that
doesn't tell us whether the error in that case is a violation of a
validity constraint or an exception indicating that the requested
action cannot be performed because some prerequisites for the action
are missing.

For what it's worth, my linguistic instinct is with Henry here, in
that if someone tells me the XML document "<doc/>" is invalid, I am
more likely to ask "against what schema?" or "what are you talking
about?" than to agree or disagree.  The term 'invalid' doesn't seem to
me to apply.  

So if the question is "should that be an error or elicit a message of
some kind?" the answer is yes, of course.  And if it's "should it be
classified a validity error?" my answer is "how can it be a validity
error?  No validation can have taken place."  I believe the situation 
is analogous to that applying for XML documents with broken
encoding declarations that render parsing impossible -- it's not
defined as a well-formedness error, because it prevents well-formedness
from being tested properly.  But that doesn't mean documents in EBCDIC
which claim to be in ISO 8859-7 are well formed.

Some further comments on individual points are appended below for
those who seek respite from whatever it is they ought to be doing
right now.

On Jan 28, 2014, at 12:24 PM, John Cowan wrote:

> Henry S. Thompson scripsit:

>>  "The present king of france is bald"

>> is not true, but not that it's false, or untrue.

> Whereas I hold with Quine and others that presupposition-failure
> sentences are just false.

Classing them as false is a convenient way to simplify one's life as
someone responsible for having an answer for everything (and in
particular, responsible for producing a Boolean value for arbitrary
sentences), but it also gives the account of truth and falsehood based
on it a certain artificiality -- even worse than the mismatch between
English 'if' and the material implication of logic.  It may be the
case that in first-order propositional calculus all sentences are true
or false; it's not a plausible claim for English, however, even for
sentences which appear declarative on the surface.

> But even waiving that, I cannot see that a definition of the form
> "g(x) is true if there exists a y and f(y,x) is true" involves a
> presupposition at all.

I believe Henry's point is that on his view (which in this question is
very similar to mine), a document is valid if it has a document type
definition and accords with the constraints expressed in that document
type definition, and a document is *invalid* if it has a document type
definition and violates some constraint in that document type
definition.  On this view, having a DTD is a presupposition for either
the predicate valid(x) or the predicate invalid(x).  Your not seeing
any presupposition appears to be just another way of saying you
believe that "invalid" means "not valid".

To say that a document without a DTD is invalid, without first
carefully defining one's terms, will strike some hearers and readers
(me among them, for the little that's worth) not so much as right or
wrong but simply as bizarre -- it is very similar, in this way, to any
statement confidently ascribing this or that property to some object
the speaker assumes must exist, and which I incline to believe does
not exist.  No one of sound mind and normal knowledge of the world
will respond to a claim that the current king of France is bald with a
"yes that's right" or a simple "not so" -- either response would
violate the normal rules of conversational implicature.  The only
plausible response to the claim would be an inquiry as to what the
speaker thinks they are talking about, or more brusquely a statement
that there is no current king of France.  Perhaps instead we might
reply: "Not so: the current king of France is not bald; the current
king of France does not exist!", then we seem to placing the existence
of the current king of France on a par with his hirsuteness.  But as
an ally of Quine, you must surely be aware that existence [and by the
same token, non-existence] is not a predicate.

If someone claims that a given document is invalid, and we see that it
has no DTD (without first carefully defining the term "invalid" to
mean something slightly different from what I think it normally
means), I think the natural response would be "against what DTD?", or
"against what schema?" -- or more generally "what are you talking
about?"

On the topic of presuppositions: let us assume that your left shoelace
is not a document with a document type declaration whose constraints
it satisfies.  Does it seem natural to you to say that your shoelace
is invalid?  Perhaps so.  Or perhaps it's more natural to say that the
predicates valid and invalid don't apply to shoelaces. (If we have to
force them to have some Boolean value, we can translate "is valid" to
"is a document and has a schema and conforms to that schema", in which
case "your shoelace is valid" and "your shoelace is invalid" will both
be false.

Part of the problem is that it is not really meaningful to say that a
document is valid without without reference to some specific document
type definition (which I will abbreviate in what follows as "schema").
When we say "document D is valid", I believe we are using a short form
for an utterance that in fuller form would be "document D is valid
against schema S", which we can do whenever the identify of S is clear
from context (as it will be if document D has a document type
declaration).  I think the same holds for "invalid", which for my
linguistic instincts definitely means "violates some constraint
imposed by schema S".

> I also don't think that changing "if" to "iff" as Liam suggests will
> help here either: in definitions, we usually treat "if" as "iff"
> anyway.  An object is a natural number if it is either zero or the
> successor of a natural number; we don't normally bother to add that
> nothing else is a natural number.

What you mean 'we', white man?

I would be very surprised if a competent mathematician seeking to give
a definition of the natural numbers, failed to ensure that nothing
else is a natural number.  There are various ways of doing so: an
explicit statement "nothing else is a natural number" is one way;
defining the naturals as "the smallest set for which the following
properties hold" is another, and I believe I remember learning that
there is at least one other formulation which I am blanking on at the
moment.

And as a reader of definitions in specs, I don't take 'if' to mean
'iff'.  When there is any doubt, I take it to introduce a sufficient
but not a necessary condition, and when there is no doubt I lower my
opinion of the editor by a notch.

You are certainly right that some editors do treat 'if' as if it meant
'iff'.  That doesn't make them right; formal specification is not the
place for descriptivism.

[I express no opinion on the particular text involved; I am objecting
to the claim that if = iff and that the natural numbers can usefully
be defined in a way that includes John's left shoelace.

...

>> I note, against my preference, something I've always been perplexed
>> by (at least I'm consistent): There are only three possible
>> categories allowed for a test in the metadata of the XML Test Suite
>> [3]:

>>  valid
>>  invalid
>>  not-wf
...

> I think this is based on "invalid" = "not valid" synonymy.

I agree that that would seem to be its basis.  I think the trichotomy
of XSD (valid, invalid, not known) is more plausible.



-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************

Received on Tuesday, 28 January 2014 21:50:12 UTC