[xhtml2] XHTML 2.0 versus conformance tools from Bjoern Hoehrmann on 2003-09-02 (www-html-editor@w3.org from July to September 2003)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 02 Sep 2003 04:47:22 +0200
To: www-html-editor@w3.org
Message-ID: <3faa04b3.226926613@smtp.bjoern.hoehrmann.de>
Dear HTML WG,

  Conformance testing tools are an important service for the web
authoring community, but the latest public XHTML 2.0 draft fails to
clearly define requirements and terminology in order to be supported by
such tools.

In summary, I think that XHTML 2.0 must clearly identify all
programmatically reportable errors in documents, make reporting these a
requirement for a specific class of product, define how to identify such
software and define how to identify documents which do not have
reportable errors.

There is already a section on "Strictly Conforming Documents" but it is
flawed for a number of reasons. It says:

  [...]
  A strictly conforming XHTML 2.0 document is a document that requires
  only the facilities described as mandatory in this specification.
  [...]

The draft does not define what a "XHTML 2.0 document", a "conforming
XHTML 2.0 document" or a "document" is. It is reasonable to expect that
"strictly conforming" is some kind of conformance profile of a more
general notion of conformance and it is thus reasonable to expect the
specification to define this more general concept. If you do not intend
to define what a "XHTML 2.0 document" and/or a "conforming XHTML 2.0
document" is, calling your only notion of document conformance "strict
conformance" is confusing and should be revised.

Now looking at the definition of "strictly conforming XHTML 2.0
documents" I do not understand what it says. I specifically do not
understand what "facilities" are, what it means for a document to
"require" "facilities" or what "facilities" are described as "mandatory"
in the specification.

"Mandatory facilities" sounds like features user agents must implement
and "require facilities" sounds like depending on the implementation of
features. If this is close to what you like to express, I would like to
say that a definition of document conformance should not be based on the
definition of user agent conformance and that it would still be not
clear what it means for a document to depend on something.

The draft continues:

  [...]
  Such a document must meet all the following criteria:
  [...]

Does this mean that a "document" that meets all these criteria is
considered a "strictly conforming XHTML 2.0 document" or covers the
former definition aspects not reflected in these criteria?

  [...]
    2. The root element of the document must be html.

    3. The root element of the document must contain an xmlns
       declaration for the XHTML 2.0 namespace [XMLNAMES]. The namespace
       for XHTML 2.0 is defined to be http://www.w3.org/2002/06/xhtml2.
       An example root element might look like:

         <html xmlns="http://www.w3.org/2002/06/xhtml2" xml:lang="en">

    4. There must be a DOCTYPE declaration in the document prior to the
       root element. If present, the public identifier included in the
       DOCTYPE declaration must reference the DTD found in Appendix F
       using its Public Identifier. The system identifier may be
       modified appropriately.
  [...]

As it is reasonable to expect that the DTD implementation provides the
xmlns attribute with a #FIXED value, 3. is covered by 4. and as it is
reasonable to expect that all schema implementations will require the
<html> to be the root element, 2. is covered by 1. Listing these
explicitly is confusing to readers of the specification.

  [...]
    1. The document must conform to the constraints expressed in
       Appendix B - XHTML 2.0 RELAX NG Definition, Appendix D - XHTML
       2.0 Schema or Appendix F - XHTML 2.0 Document Type Definition.
  [...]

Please clearly identify what your definition of "strictly conforming
XHTML 2.0 documents" considers an expressed constraint in these
appendices and what it means to conform to them.

Depending on the answer to my previous comment, this either contradicts
the former definition of "strictly conforming XHTML 2.0 documents" or
further subsets the definition of document conformance.

For example, the navindex attribute is constrained to numbers in the
range 0-32767. It is not possible to express such a constrained in a XML
1.0 DTD. If the specification considers navindex="666666" not a
"strictly conforming XHTML 2.0 document" you rather want to say that it
must conform to all the constraints in all enumerated schemas which is
then merely an informative note.

If such a document *is* considered a "strictly conforming XHTML 2.0
document" you have (in theory, as you do not provide the neccessary
material to provide evidence)

  * strictly conforming XHTML 2.0 + A. B documents
  * strictly conforming XHTML 2.0 + A. B + A. D documents
  * strictly conforming XHTML 2.0 + A. B + A. D + A. F documents
  * strictly conforming XHTML 2.0 + A. B + A. F documents
  * strictly conforming XHTML 2.0 + A. D documents
  * strictly conforming XHTML 2.0 + A. D + A. F documents
  * strictly conforming XHTML 2.0 + A. F documents

That's seven profiles of strict conformance. Of course, this number gets
multiplied by various factors as the specification has further
requirements for documents that are not expressable in any of the
normative schema languages, for example, <address> is required to
contain "contact information" and <object declare='declare'> is required
to have an id attribute. So you get at least 28 classes of "strictly
conforming XHTML 2.0 documents" - not the mention the even larger number
of potential classes for a term like "conforming XHTML 2.0 document".

This is insane!

There is no need to define document conformance in a way that allows
documents violating requirements of the specification to be identified
as in conformance with the specification, especially not as in "strict
conformance" with the specification.

I neither see a need to define "strict conformance". This is a term that
causes nothing but confusion. Conformance is never understood as lax so
how can it be "strict"?

Say I implement XHTML 2.0 support in HTML Tidy. XHTML 2.0 says it is an
error for an <object> element to have a declare='declare' attribute but
no id='...' attribute. Tidy could report this as

  Error: <object> element lacks 'id' attribute

If this is the only error in the document, how would I identify the
status of the document? Would I say

  Congratulations, this document is a strictly conforming XHTML 2.0
  document!

Users would not understand how a document can have errors but still be
strictly conforming.

Say I want to contribute XHTML 2.0 support to the W3C MarkUp Validator.
This service has a long history of using specific terminology to
identify the conformance status of documents it "validates", e.g.,
"Valid XHTML 1.0 Strict" and as W3C keeps talking about "valid",
"validity", "validators", etc. I would likely have to call a specific
class of data objects "Valid XHTML 2.0". But what would I have to test
for to determine whether a data object is a "Valid XHTML 2.0 document"? 

The XHTML 2.0 draft neither defines what requirements data object have
to meet to be classified as "XHTML 2.0 document" nor what additional (if
any) requirements have to be met to be considered a "Valid XHTML 2.0
document". It is all too obvious that speaking of "valid XHTML 2.0
documents" is a very good idea especially as the web authoring community
is already used to the respective terminology for HTML and XHTML 1.x -
so why is there no clear definition for this term, one that one can
directly link to from "validation" results, for example?

Please explicitly define the term "valid XHTML 2.0 document" in the
specification. Of course, such a definition must not depend solely on a
definition of "valid" in one of the normative schema languages as long
as the specification has testable requirements for documents that are
not expressable in all normative schemas.

XML 1.0 is a shiny counter-example to all XHTML specifications so far
that fullfills my requiremements listed at the beginning of this
document:

  * reportable errors are identified trough "well-formedness
    constraints" and "validity constraints"

  * documents that meet the well-formedness constraints are identified
    as "XML 1.0 documents" or more specifically "well-formed XML 1.0
    documents"

  * documents that meet the "validity constraints" are identified as
    "valid XML 1.0 documents"

  * software that reports violations of well-formedness constraints is
    identified as "XML 1.0 processor"

  * software that reports violations of validity constraints is
    identified as "Validating XML 1.0 processor"

It would be even better if XML 1.0 had a definition of "XML 1.0
Validator" but it is quite good as it stands.

The web authoring community wants to talk about document conformance, it
thus needs good terminology to avoid wasting time trying to interpret
bad or undefined terminology. There is no reason why terms that will
commonly be used when talking about XHTML 2.0 should not have as clear
defintions as the XML 1.0 Recommendation provides for the XML community.

Thanks.
Received on Monday, 1 September 2003 22:47:49 UTC