Re: XML and required DTDs

> Observation: if XML instances are to be processed without the DTD,
> then it's not possible to know with certainty from an instance what
> the data types (declared values) are for the attributes.  This --
> aside from the fact that some declarations might need to be sent with
> the XML instance to provide information about defaulted attributes,
> notations, etc.  In this sense, an XML document instance is weaker
> semantically than an SGML document with its required DTD.  

The processing application should either a) already know what it needs to
know (as in the case of HTML) or b) know what external meta-data file to
download to get the information (as in the case of SGML with style sheets)
or c) get the DTD. But it should have the choice. Right now you CANNOT
parse an SGML document correctly (in general) without the DTD.

> If the client assumes (or can know from a MIME type?) that the instance
> is a candidate for meaningful display, what does it do?  (Ignore the
> fact that the content represented by "xx" is dummy text).  The server
> sending the instance will also need to send a stylesheet which says
> that (e.g.,) "qdd is a line-breaking element" and "qkqr is to be
> displayed in italic typeface" and so forth.  Something also has to
> tell my browser that QDD's attribute 'ko' encodes a HyTime clink, and
> that 'uff' is (therefore) for storing an ID.  Etc.

Only if the processing application cares about HyTime links or stylesheets.
Since most Web-browsers do NOT care about HyTime links, they would NOT need 
the DTD (for now). At some point in the future, they may decide to add
HyTime link support, and they would either have to download the DTD or some
other form of "list semantics sheet". But the application designers should
have the choice.

> Thus, I still don't understand the point of enabling "DTDless"
> processing of XML instances, as per Tim Bray's posting #172, where
> "search, display, analyze" are examples of such processing, *if* it is
> necessary to have and process, in addition to the instance:
> 
>     * a rendering stylesheet

>     * a collection of declarations to account for attribute defaulting,
>          notations, etc)
>     * a set of other specification of semantics like (HyTime) reftype, clink,
>          which express fundamental relational semantics that are not
>          expected to be in a specific stylesheet

Each of these is only need for some applications. Netscape would use the 
first. Alta Vista would use the second and third. Panorama would use all three.
It may be helpful to think of a DTD as a "validation stylesheet". You only
need it if you intend to validate.

> Someone says: (1) "No need to ship a stylesheet: the XML document can
> just reference one, and if it's common, it should be on the receiving
> system's local machine."  OK: why not also for the DTD?  Or: (2)
> "Naw, we assume a common tag set, so that we know <p> means
> 'paragraph' and <li> means 'list item'."  OK: how then is XML much
> of an improvement over HTML?

Even if this tragic situation were to come about, an extensible HTML is
much better than what we have now! You probably don't remember, but we
had a discussion about you using propreitary "meta-data" tags in the SGML
Web Page. You and I agreed that strictly speaking, HTML doesn't allow them.

I do not think that this group has decided against requiring the existance
and availability of a DTD for each instance. But the goal is that processing
systems should not have to fetch the DTD unless they want to.

I think that an XML document that does not point to its DTD through a URL 
should probably be considered an invalid document. XML is being developed
explicitly for the Web, after all, and we know what a mess it is to try to
work with documents that do not describe their content clearly.  I might 
advocate an explicit exception for documents with a root element with 
GI "HTML".

 Paul Prescod

Received on Tuesday, 17 September 1996 12:47:31 UTC