W3C home > Mailing lists > Public > www-html-editor@w3.org > July to September 2004

DTD (and schema) references in XHTML document types.

From: David Carlisle <davidc@nag.co.uk>
Date: Mon, 26 Jul 2004 17:59:29 +0100
Message-Id: <200407261659.RAA04934@penguin.nag.co.uk>
To: www-html-editor@w3.org


This message mainly concerns section 3.1.1 although Issue 4.1 is
related.

3.1.1. Strictly Conforming Documents
http://www.w3.org/TR/xhtml2/conformance.html#s_conform


   1.
   The document must conform to the constraints expressed in the schemas in
   Appendix B - XHTML 2.0 RELAX NG Definition, Appendix D - XHTML 2.0
   Schema and Appendix F - XHTML 2.0 Document Type Definition.

Appendix F is still a shell but depending on how it is written this  may
constrain XHTML elements to be unprefixed. presumably the DTD _will_
make this constraint unless overridden by a parameter entity using the
usual namespace prefix trick as in xhtml 1 and mathml. If unconstrained
parameter entity definition is allowed then essentially mandating dtd
conformance is no constraint at all as any part of the dtd can be
redefined. It wasn't clear to me if the intention was to allow or
disallow prefixed elements. (I believe that the intend of this clause
is sound, but that, depending on how the DTD module ends up being
defined, it may need careful wording.)



   2.
   The local part of the root element of the document must be html.

This constraint is redundant as it follows from the Relax NG part of (1)
above, which can specify the top level element with its start pattern.
Actually as noted above, (1) may currently specify more than this, that
the _name_ (rather than just local name) is html.



   3.
   The start tag of the root element of the document must explicitly
   contain an xmlns declaration for the XHTML 2.0 namespace [XMLNS]. The
   namespace URI for XHTML 2.0 is defined to be
   http://www.w3.org/2002/06/xhtml2. 

Is "xmlns declaration" to be taken to mean exactly xmlns= which again
implies that xhtml is not prefixed, or is it intended to include
<foo:html xmlns:foo="http://www.w3.org/2002/06/xhtml2" ...>
If prefixes are intended to be allowed I would suggest using the terminology
"Namespace Declaration" rather than "xmlns declaration" as that is what
is used in the Namespace and XPath recs for example.

[It's regrettable that the XHTML2 namespace has changed (for all the
 reasons explained in the infamous 3-namespaces-for-xhtml debate) but I
 doubt I can convince you of that. It does however mean that there is
 really no sense in which "xhtml2" is version 2 of any language, since it
 has no elements in common with any previous language.]



  The start tag must also contain an xsi:schemaLocation attribute. The
  schema location for XHTML 2.0 is defined to be TBD.

It would be highly desirable not to force this. (which also implies the
document element needs an xmlns:xsi declaration, although that isn't
mentioned here). It was one of the main aims of XML to allow documents
to be sent over the web without needing a dtd (schema) reference, as was
required by SGML, hence making the document tree explicit with /> syntax
and no implied close tags. In this case, since a XSD Schema validator
isn't even required to respect xsl:schemaLocation it doesn't enforce
anything, and is just a burden on the author. It isn't an oversight that
Relax provides no in-document reference to the schema, but a benefit,
just because XSD and DTD do optionally provide such a facility it
doesn't mean that they should always be used, and certainly not that
their use should be mandated. If this use is to be mandated, do you mean
to mandate the xsi: prefix? (this again has a bearing on DTD conformace
outlined in 1)


   4.
    There must be a DOCTYPE declaration in the document prior to the
    root element. If present, the public identifier included in the
    DOCTYPE declaration must reference the DTD found in Appendix F using
    its Public Identifier. The system identifier may be modified appropriately.

For reasons given in the case of schema this is also regrettable, but in
this case the effects are much worse.
The XHTML DTD can easily be several times larger than a typical web file
and by default an XML application is liable to download this before
starting to render the page. This gets even worse for something like
XHTML+MathML which has more elements and lots more entity definitions.

In order for this to be feasible you are essentially mandating that
every application include a local catalog or otherwise hardwire
"knowledge" of this FPI and avoid downloading the DTD. At least in the
case of xsi:schemaLocation there is not a combinatorial explosion for
mixed namespace documents, but in this case systems would have to
hardwire knowledge of XHTML, XHTML+MathML, XHTML+SVG, XHTML+MathML+SVG ...

This isn't just a potential problem, it's already a real problem: Mozilla
for example already hardwires several of these combinations, so in those
cases it "works" but if you use any other perectly reasonable looking
combination Mozilla will use the option provided by the XML spec of
_not_ downloading the referenced DTD, in which case any entity reference
will cause an error. Conversely IE's XML component which doesn't really
will fetch the DTD which means that XHTML+MathML pages render _really_
slowly if they start with a DOCTYPE. 

Sorry about the length of this message.

David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
Received on Monday, 26 July 2004 13:03:13 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:14:52 UTC