[whatwg] Re: Doctype FPI

On Wed, 14 Jul 2004, Terje Bless wrote:
> 
> Do you forsee specifying the deliverables of the WHAT WG in a way that 
> supports machine verification of basic syntactic conformance, analogous 
> to Validation in SGML applications (including XML)?
> 
> If so, in what way and using what tools?

I personally do not intend to spend any time on such matters, but fantasai 
has started writing DTDs to check the (few) parts of the conformance 
criteria in WHATWG specs that DTDs are able to check.

See: http://syntax.whatwg.org/


>> If the author wants entities, then the (otherwise mostly empty) DTD 
>> would be the right place for them.
> 
> So authors are expected to edit the DTD? Do you intend for this to 
> happen in either of the internal or the external subset, or just one? 
> Does this imply that the prose of the specification will define a set of 
> known entities (such as, e.g. °)?

HTML entities are not going to be removed, so those can still be used. 
(But in practice, those don't need DTDs.)

For XML, any entities would have to be in the internal subset.


>>> What SGML Declaration do you intend be in effect?
>>
>> I do not intend to pretend that current UAs even have the concept of an 
>> SGML Declaration. (The only UA that I know of that supports the 
>> concept, in fact, is the validator.)
> 
> I am not familiar with the inner workings of the UAs you probably have 
> in mind, but I would be very much surprised if they do not have a 
> facility that is at least analogous to the function of the SGML 
> Declaration.

They really don't. Pretty much all the parsing of HTML documents is 
hard-coded.


>>>> Then, assuming we don't ever introduce elements with optional tags 
>>>> (which I highly doubt we will), we never need to update the DTD 
>>>> again.
>>>
>>> But assuming you don't -- Will the SGML Declaration reflect this (by 
>>> e.g. removing the corresponding SHORTTAG features)?
>>
>> Uh. If we introduced an element with optional end tags, it would be 
>> pretty stupid of us to then disallow optional end tags, no? Or am I 
>> missing something.
> 
> You've misread the question. Iff you intend to never introduce elements 
> with optional tags, will you alter the SGML Declaration to reflect this 
> intent (by disallowing certain markup minimization features)?

HTML already has optional end tags, so the point is moot (we aren't 
changing that part of HTML).


>>> Will the conformance requirements require document instances be 
>>> fully-tagged? Amply-tagged?
>>
>> Not sure what you mean by "tagged".
> 
> Are you familiar with SGML? WebSGML (Annex K)?

Not really. I read Goldfarb cover-to-cover a few times, but I haven't used 
it much apart from in an HTML context. I haven't read WebSGML.


>   K.2.2 Definitions related to validity assertions
>   K.2.2.1 fully-tagged document instance:
> 
>   A document instance in which a start-tag with a generic identifier,
>   and an end-tag, are present for every element, and the attribute name
>   is present in every attribute specification in the start-tag.
> 
>   Note 1: An SGML declaration requires document instances to be
>   fully-tagged if it specifies OMITTAG NO and SHORTTAG STARTTAG EMPTY NO
>   and ATTRIB OMITNAME NO. A system should offer means, such as a
>   parameter to the invocation of processing, to request validation of
>   whether an instance is fully-tagged even when the SGML declaration
>   does not require it to be.
> -- http://www.sgmlsource.com/8879/n0029.htm
> 
> K.2.2.4 defines amply-tagged document instances, as ?A document 
> instance whose use of markup minimization does not require access to a 
> document type declaration.? (see above URL for the rest)

Ah, ok. Well then no, neither of these features will be used by WHATWG 
specs, since they have to remain compatible with existing parsers and 
HTML4 documents.


>> Yes. People rely on DTDs in a way which has led to millions of authors 
>> to have a false sense of having done the right thing, when in fact 
>> their documents are sometimes worse than documents that are 
>> syntactically slightly broken but semantically fine.
> 
> Please try to examine that paragraph in an objective fashion. Your 
> language appears to be designed to evoke an emotional response 
> ("millions of authors", "false sense", "the right thing", etc.). If you 
> would like to make this point, I would appreciate it if you could recast 
> it in more neutral language so I can better understand it.

Ok:

There exist authors who use automated syntax checking tools based on DTDs 
to verify the conformance of their documents, without understanding the 
limitations of such tools. These authors have, on occasion, written 
documents that are marked as valid by these tools, but that are actually 
non-conformant. Such authors and documents are, in my experience, quite 
numerous.

There also exist authors who do not use automated tools, but, through 
their understanding of HTML, create mostly correct documents. While these 
documents often may have many validation errors, simple error handling can 
usually recover the original intent of the author.

It is my opinion that documents of the first kind are usually poorer, in 
terms of general accessibility, than documents of the second kind. From 
this I assert that the limited ability for DTDs to detect conformance 
errors has caused harm.


>> Schemas aren't much better.
> 
> In what sense? Your arguments above partly focus on DTDs inability to 
> specify datatypes and provide attribute syntax verfification, something 
> which Schema facilities seem to offer. Is your claim based on their 
> inability to provide semantic and stylistic verification?

There are many aspects of even syntax checking that schemas are currently 
unable to describe, although their abilities are indeed better than DTDs' 
in this regard.

For example, to my knowledge it is still not possible to say "if the 
element's type attribute has the value radio, then these attributes may be 
used, otherwise these attributes may be used". Or to say "this element may 
be placed anywhere a <select> element may be placed" or "this element has 
the same content model as either <datalist> elements _or_ <optgroup> 
elements, but not both at the same time". (It is probably possible to 
define similar things, which in simple cases would be equivalent, but in 
multi-namespace cases where the various schemas are not under the control 
of the same person, this becomes intractable.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 17 August 2004 07:23:34 UTC