[Prev][Next][Index][Thread]

Re: C.4 Undeclared entities?



Henry Thompson wrote:

> OK, I'll bite.  David is merely the last in a moderately long list
> (i.e. at least three people :-) who have asserted without any argument
> that "users won't include a <!DOCTYPE ...>, so we shouldn't require
> one for well-formedness."  I have to say I just don't get it -- why
> ever not?  They're going to have to do a lot of other, more
> substantial, things differently from what they are used to, if they
> are hope-to-die HTML mavens, who are the only group I can suppose
> David et al. have in mind.  

I won't speak for David but I for one would like to use XML without a DTD. I can 
imagine doing lot's of interesting (perhaps impromptu) work that would be 
difficult if a DTD were required. The proposition that only valid XML documents 
be interchanged precludes this type of work. 

Requiring a "dummy" DOCTYPE declaration raises the question - Why? I doubt that 
8879 conformance will fly as an answer for all XML documents. Let 8879 
conformance apply to *valid* XML documents and have somewhat more relaxed rules 
for well-formed and other XML documents. If an XML document contains a DOCTYPE, 
the receiving application would be expected to locate, obtain, and use the 
referenced DTD. Without the DOCTYPE, a generic XML application should assume 
well-formed input. Specific applications might assume something less than 
well-formed. What's wrong with this picture?

Finally, I'll take a perhaps unpopular position and state that I doubt we have 
written the last chapter on document structure. I doubt we ever will. If we 
leave XML open-ended, someone might invent something that goes well beyond what 
is possible with a DTD while still remaining, even if tenuously, within the XML 
framework. To me, that is a very interesting possibility and I see no reason to 
discourage such activity. Requiring DTDs, dummy or otherwise, effectively limits 
XML's potential by stating that DTDs are the only mechanism by which document 
structure can be specified.

> After all, both SGML fans and total
> newbies won't have any problem with following this rule.  Why is it
> likely that HTML fans, who after all have at least HEARD of
> <!DOCTYPE ...>, will ignore this requirement but not, say, the
> requirement to provide explicit end tags?  Or the requirement to quote
> all attribute values?  Seems modest by comparison, and a small price
> to pay for SGML compatibility.

While some HTML fans have heard of <!DOCTYPE ...>, I suspect that only a small 
percentage of HTML documents actually contain a DOCTYPE declaration. I've done a 
statistically insignificant, not even close to random survey, of some web sites 
to see who uses <DOCTYPE ...> on their home page. Here are the results:

Sun		no		Spyglass	yes
JavSoft		no		Ebt		yes
Netscape	no		Softquad	yes
Microsoft	no		Textuality	yes
Ncsa		no		Passage		yes
Adobe		no		Arbortext	yes
Excite		no		W3c		yes
Yahoo		no		Gca		yes
Lycos		no		Isogen		yes
Verity		no		Fulcrum		yes
				Sgmlsource	yes

I'll leave a detailed analysis of the pseudo-results to the individual but at 
the highest level, SGML literati use DOCTYPE and others don't.

Explicit end tags will be forgotten or omitted. Attribute values won't be 
quoted. Countless "errors" will be found in XML documents, just as they are in 
HTML and SGML documents. In at least some applications, these errors will 
manifest themselves in obvious ways and authors will take corrective action. 

However, the absence of a dummy DOCTYPE in a well-formed XML document probably 
won't be obvious in many applications. Why? If my suspicion is correct I doubt 
that most developers will put in code to check for a condition that if 
encountered can cause no harm. In fact they would normally do something quite 
different - eliminate the condition. We can provide that service to all XML 
developers simply by stating that DOCTYPE is required only for XML documents 
that purport to be valid.

Requiring that all XML documents carry <!DOCTYPE foo SYSTEM> in the name of 8879 
conformance seems quite the hack to me. I and others have argued that it will be 
ignored by XML users. I doubt that existing SGML systems will be able to do much 
more than report an error upon encountering <!DOCTYPE foo SYSTEM>. Of course the 
void's entity manager could locate foo and return it but as I've stated before I 
have doubts about the void.

So what is the practical purpose of requiring a DOCTYPE declaration in 
less-than-valid XML?