WD-html-in-xml-19990224 not ready for PR

I would advise strongly against moving XHTML to PR in its current
form, because the WD seems to have fundamentally missed the potential
advantages of namespaces and self-identifying resources.

There are three fundamental flaws that need to be fixed before the
Director should reasonably consider moving this specification to PR:

1. The DOCTYPE declaration should not be required (Section 3.1, item 
   4), and if present, should be used only for (optional) DTD
   validation -- the public ID in the DOCTYPE declaration is simply an
   indirect pointer to an external DTD subset, not a unique identifier
   for a document type.  It was required for HTML 4.0 as a messy
   kludge only because there was nothing better available; we have
   namespaces now.

2. There should not be three separate HTML namespaces (Section 3.1,
   item 3) -- the namespace is part of a unique identifier, not a DTD.
   Unless the semantics (not the content model) change in the future,
   and HTML 'cite' element should always have the same namespace, no
   matter what particular DTD is in use.

3. Strict DTD conformance should be allowed but not required (Section
   3.1, item 1); the use of namespaces will allow a processor to
   determine what is and is not part of XHTML -- requiring strict DTD
   conformance simply makes it impossible to add extensibility later,
   and clean extensibility is really the only justification for XHTML
   in the first place.


Further Discussion
------------------

For #1, I recommend making the DOCTYPE declaration optional because it 
is really not necessary for non-validating XML processing and only
complicates things unnecessary.

For #2, consider the following use case: I want to ask a search engine
to find every instance of the word "wind" within an HTML <cite>
element (note that I don't want to find "wind" within other <cite>
elements, only within HTML's).  With your current setup, I would have
to instruct the search engine to find every instance of "wind" within
{http://www.w3.org/Profiles/xhtml1-strict.dtd}cite *or*
{http://www.w3.org/Profiles/xhtml1-transitional.dtd}cite *or*
{http://www.w3.org/Profiles/xhtml1-frameset.dtd}, and when the next
version comes out, I will need to add three (or more) other namespaces 
to the list, etc.

There should be a single, unique namespace for HTML that is persistent 
across versions to enable search engines and other similar software to 
function efficiently.  To avoid confusion, it would also be best not
to include DTD files as part of the namespace URLs, since namespaces
are not schemas.

For #3, there is no reason to be arbitrary and restrictive; instead,
you simply need a set of rules governing how processors should react
to non-HTML (or unrecognised) element and attribute types.  One
(hastily-thought-out) suggestion follows.


Suggestion
----------

Current processor behaviour is as follows:

- for unrecognised attribute types, the attribute should be ignored;

- for unrecognised element types, the elements contents are processed
  as part of the parent element.

The attribute rule is fine for XHTML as well; you could refine the
element rule by adding a special attribute (say, 'html:default') with
allowed values along the lines of 'skip' or 'process' ('process' would
be the default) specifying what a processor should do if it does not
recognise an element type.  In other words, if the processor found

  <p>aaa <x:y html:default="process">zzz</x:y> bbb</p>

and it didn't recognise <x:y>, it would treat this as if it read

  <p>aaa zzz bbb</p>

On the other hand, if it found

  <p>aaa <x:y html:default="skip">zzz</x:y> bbb</p>

and it didn't recognise <x:y>, it would treat this as if it read

  <p>aaa  bbb</p>

The default value would be 'process', so the first case would also
apply for

  <p>aaa <x:y>zzz</x:y> bbb</p>


All the best,


David

-- 
David Megginson                 david@megginson.com
           http://www.megginson.com/

Received on Monday, 1 March 1999 22:04:37 UTC