Comments on XHTML 1.0 DTD and type enforcement from Philippe Verdy on 2000-05-18 (www-html-editor@w3.org from April to June 2000)

From: Philippe Verdy <pverdy@sofarxmedia.com>
Date: Thu, 18 May 2000 02:12:41 +0200
To: <www-html-editor@w3.org>
Message-ID: <000c01bfc05d$cc2f32c0$8a0a0ac0@sofarxmedia.com>
I am really confused by some rules in XHTML 1.0 DTD, which do not enforce some requirements of HTML4.
This lack of enforcement is typical within the DTD's in the section "=== Imported Names ===", where nearly all is specified as CDATA, instead of enumeration references or more restrictive types like NAMES.

For example, to define the value of a frame target name attribute, you use:
       <!ENTITY % FrameTarget "NMTOKEN">
       <!-- render in this frame -->
which I think is a little restrictive, but well, this could acheive the appropriate result where a frame name should not include some ambiguous separators, because they can be used within scripts as variables.

But if that's the reason, I think that it should have been a NAME, (it should be a REF, but the target is not itself part of the document, so this does not reference an ID of the current document), and not a NMTOKEN.

If that's not the good reason, I think this should have been really CDATA, because frames are named by using the deprecated "name" attribute, which was very permissive, or by using the newer "id" attribute which requires an ID, and must be referenced by an attribute of type REF (those are much more restricted, as these should be valid a NAME, made unique within the frameset to form an ID).


Another example:
     <!-- these are used for image maps -->
     <!ENTITY % Shape "(rect|circle|poly|default)">
May be you're a little too restrictive on what an image map shape can be (the only non polygonal shape is a circle, no space for an oval in any future browser enhancement, or for a spline or Bezier curve, or for any other 3D shapes, should there be usable client-side maps in other navigatable objects or images). For now, you should not have enforced it, and I think that CDATA was more appropriate (you should only have documented the standard list of shapes without restricting them so much).


Other examples:
       <!ENTITY % Number "CDATA">
       <!-- one or more digits -->
       <!ENTITY % Length "CDATA">
       <!-- nn for pixels or nn% for percentage length -->
       <!ENTITY % Pixel "CDATA">
       <!-- nn for pixels or nn% for percentage length -->
Wow! No restriction at all on what a number can be! Can this be possible ? Where are the definitions that define the format of HTML4 units ?


And this one:
      <!ENTITY % Coords "CDATA">
      <!-- comma separated list of lengths -->
There's no enforcement on what coordinate pairs can be!


Another example, to define the value of the CSS class names references in element attributes, you write:
      <!-- core attributes common to most elements
        id       document-wide unique id
        class    space separated list of classes
        style    associated style info
        title    advisory title/amplification
      -->
      <!ENTITY % coreattrs
       "id          ID             #IMPLIED
        class       CDATA          #IMPLIED
        style       %StyleSheet;   #IMPLIED
        title       %Text;         #IMPLIED"
        >
This time, you're too laxist on what a class attribute value can be! I think it should be a NMTOKEN, because CSS selectors are not simple strings, but have a composite syntax combining possible element NAME, a class NAME, and other separators, like @ to select stylesheets based on capabilities of the rendering engine of browsers, the space to form contextual styles, comma to select alternative contexts or elements... To make such decomposition possible, there should be restrictions on what a "class" attribute can be (at least for now the only restriction done is on element names).


Because of all that, I don't think that actual browsers will really use these DTD, because their built-in XML parser won't be enough information to render the document. They will need additional code to parse many values within the document. Without a normative DTD that define how to decipher them, there will still be many differences and incompatibilities between browsers on how to interpret values correctly.


May be you're working on it, or may be there will be new subtypes (profiles) of documents, where this DTD will only be a core definition. But the lack of specifications about attribute values will conduct each implementor to build their own DTD, and not use this normative one, so there will be documents which will reference other DTDs which will work well only on some browsers.

One step in that direction is the XHTML 1.1 DTD framework, but as far as I have seen, this framework still defines too permissive attribute values, and no way on how to parse them. This framework has still be be customized to be usable within a pure XML parser that can render true HTML contents, or we will depend on custom C-written code to finalize the HTML document parsing.

________________________________________________________________________
 Philippe Verdy, X-Media     Tel: +33 (01) 46 43 90 00
 52 boulevard Vital Bouhot   Fax: +33 (01) 46 43 90 09
 92200 Neuilly Sur Seine     France
<a href="mailto: pverdy@sofarxmedia.com "> X-Media, Philippe Verdy </a>
________________________________________________________________________
Received on Wednesday, 17 May 2000 20:14:07 UTC