XHTML Comments from David Brownell on 1999-09-14 (www-html@w3.org from September 1999)

From: David Brownell <david-b@pacbell.net>
Date: Tue, 14 Sep 1999 13:40:35 -0400 (EDT)
To: www-html@w3.org
CC: dsr@w3.org
Message-ID: <37DE886B.20EAB50A@pacbell.net>
Greetings,

The XHTML draft (http://www.w3.org/TR/xhtml1/) doesn't give an address to
use for feedback, so I'll send these to the www-html list instead.

- FIRST, since there's only one HTML vocabulary, it should have only
  one namespace.  This was done correctly in the last public draft; I
  know that folk such as myself, David Megginson, Tim Bray, and others
  had requested that it be done that way.  One vocabulary, one namespace;
  that's the model from the XML Namespaces specification, and it's also
  the operational model people have about HTML already.  One doesn't see
  editors or MIME types clarifying transitional/frameset/strict; one only
  sees "HTML" editors, generally using the full vocabulary.

  There's been a considerable amount of debate about the fact that this
  has too many namespaces.  No explanation was provided about why it'd
  be desirable to do make such a change.  On the face of it, it's not a
  reasonable change -- it seems to reflect a misunderstanding about the
  fact that words are different from rules about using them.  That is,
  it confuses two fundamentally different parts of the system. 

  An analogy that may help:  when teaching any language (English, French,
  HTML, etc) one generally starts with a widely understood subset and
  then masters that subset.  When one "graduates" to a more complete
  understanding, that doesn't invalidate what one knows about that subset.
  By defining three namespaces, one is in effect defining three languages
  rather than accepting the longstanding model of one, with a core (which
  is more like HTML 3.2 than the 'strict' subset) and other components.

  Rather than see three namespaces, there should be one.  Or if this
  is for some reason not acceptable (one hears stories about certain folk
  who've said "over my dead body") then there should be zero namespaces
  for now, and one defined later with the modularization work.

- SECOND, I sent a report about the bug in the definitions of the "amp"
  and "lt" entities, which violate the XML specification because their
  replacement text is not well formed.  (See the XML spec for examples
  where the replacement text _is_ well formed.)  Just noting it here; it
  is a nit that needs fixing, but I know of no XML processor that'd care.

- THIRD, looking at the DTDs I confess I'm puzzled why they didn't get
  written as one DTD using two parameter entities to conditionally
  include declarations for the two categories of non-"strict" names.

  Things would be a lot clearer if the DTD actually reflected its logical
  structure in that way -- or at least commented why it wasn't done.
  (I know there's separate work on modularization, but one can provide
  better DTDs without defining an approach to fine-grained modules.)

- FOURTH, apart from the HUGE (!!) botch with the namespace issue, this
  seems pretty good.

- FIFTH, I would comment in 1.3 that another reason to use XHTML is
  that XML validators are significantly more available than HTML ones,
  so that it can be routine to create valid XHTML.  This means that
  there can be a significant increase in the amount of web content that
  can be viewed on all browsers, AND a corresponding reduction in the
  size of those browsers since they won't need to handle so many cases
  of malformed documents.

- SIXTH, call me paranoid but I'd like to see a non-normative suggestion
  that user agents have mechanisms to cache the XHTML DTDs according to
  their public identifiers, and a normative requirement that if the do
  so it be the one provided by W3C (no modifications permitted).

- SEVENTH, about Appendix C.1 ... since US-ASCII is a strict subset of
  UTF-8, and can be parsed (as UTF-8) without an XML declaration, you
  should mention it there.  This is quite significant, since I've observed
  that US-ASCII (the seven bit character set, never to be confused with
  ISO-8859-1) is the safest encoding to use for XHTML documents.  In fact
  that's what I've ended up using for everything.  (I'm sending this to
  the editor alias too.)

That last point (C.1) is one I've run into while writing code to generate
XHTML documents, and testing it on various systems.  As "real world"
feedback I think it's quite significant.

- Dave
Received on Wednesday, 15 September 1999 07:26:49 UTC