Re: On Henry's comment about documents with DOCTYPE but without markup declaration

Paul Grosso writes:

> Let's agree that by "validating processor" we mean "an XML
> processor parsing in validating mode" to avoid considering
> a tool that can parse in either validating or non-validating
> mode as a single processor.

For sure.

> In section 5.2, the XML spec says:
>
>  The behavior of a validating XML processor is highly
>  predictable; it must read every piece of a document
>  and report all well-formedness and validity violations.
>
> Since the lack of a document type declaration clearly
> implies the violation of several validity constraints,

It doesn't imply that to me!  I read the conformance section as
meaning that validity constraints only operate in the presence of a
document type declaration.  I.e. my 'for sure' above was too glib.  I
think a case can be made that a conformant XML processor should only
"operate in validating mode" in the presence of a document type
declaration.

>>> On the other hand, Henry says that <!DOCTYPE html><html/> is
>>> "invalid", and then that confuses me, since that is well-formed.
>> The example was easy to misread, sorry, but not was you quote it,
>> rather:
>> <!DOCTYPE html>
>> <hmtl/>
>>
>> I should have used
>> <!DOCTYPE html>
>> <xyzzy/>
>>
>>
>
>
> No, Henry, I did not misread or misunderstand the last example
> in your message (the hmtl one).
>
> My comment above refers to your earlier example and statement,
> to wit:
>
>
>> It would also probably be a good idea to clarify that as things stand
>>
>>    <!DOCTYPE html>
>>    <html/>
>>
>> is, using the usual convention, _invalid_, where
>>
>>    <html/>

Ah, yes, sorry.  That's invalid because it has a document type
declaration, but no element declaration for the 'html' tag.

>> is neither valid _nor_ invalid...
>
> where--as I said in my previous message--you claim that
> <!DOCTYPE html>
> <html/>
> is "invalid", but I believe it to be well-formed (do you
> disagree that it is well-formed?)

No.  It is well-formed

> hence my confusion about your definition of "invalid".

So here's the only thing I said which I think you might mean by that:

> [we could provide] a definition of 'invalid' as "given a document
> type declaration, violating one or more of the constraints expressed
> by the declarations in the DTD, and failing to fulfill one or more
> of the validity constraints given in this specification".

So it's perfectly possible to be well-formed and invalid.  I
think the relationship between the two is as follows:

Strings are either XML or they're not.  It's slightly dangerous to
write "not well-formed XML" -- it's okay if understood as meaning (not
(well-formed XML)), but not as meaning ((not well-formed) XML).

If a string _is_ well-formed XML, I think there are two choices,
one dependent on the other:

Does it have a document type definition?

  If so, is it valid or not, per the conformance section?  If it isn't
  valid per the conformance section, we might agree to call it
  'invalid'.

  If not, there is nothing more to be said.  In particular, it doesn't
  make sense to label it 'valid' _or_ 'invalid'.

ht
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]

Received on Tuesday, 28 January 2014 17:50:29 UTC