Re: XML CG comments on XHTML Role Attribute Module last-call draft of 7 April 2008 from Shane McCarron on 2008-10-01 (www-html-editor@w3.org from October to December 2008)

From: Shane McCarron <shane@aptest.com>
Date: Wed, 01 Oct 2008 09:06:16 -0500
To: "www-html-editor@w3.org" <www-html-editor@w3.org>
CC: XHTML WG <public-xhtml2@w3.org>, w3c-xml-cg@w3.org, "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
Message-ID: <48E383D8.3080305@aptest.com>
Thanks for your comments.  The XHTML 2 Working Group has discussed these 
as a working group.  Our comments are scattered below.

C. M. Sperberg-McQueen wrote:
>
> Dear colleagues:
>
>
> Comments (4) through (9), on the other hand, relate to areas where the
> Working Groups of the XML Activity have particular responsibility and
> competence.  If upon consideration you find you disagree with us on
> them, then we have a cross-domain coordination issue that requires
> attention.
>
> (1) First, we congratulate the XHTML Working Group for providing a
> useful and clear namespace document for the namespace
>
>   http://www.w3.org/1999/xhtml/vocab/
>
> We wish more groups responsible for namespace did so well by the users
> of their namespaces.

Thank you.  However, it should be noted that this is not a namespace.
We know that the document mistakenly uses the string "namespace" in a
couple of places, and have fixed that.  This is a vocabulary definition
document.  It defines a collection of terms that are used in some XHTML
family attributes in conjunction with CURIEs and of course with RDF via
their expanded values (xhv:banner, for example, expands to
http://www.w3.org/1999/xhtml/vocab#banner).   The terms in this
vocabulary are never referenced as QNames.  More about this later.

> (2) That said, we think the namespace document could be improved by
> the addition of some more information.  A document date would be
> helpful, and the identity of those responsible for the text of the
> document, and for the namespace, could be stated more explicitly.
> (From the fact that "The XHTML specifications are developed by the W3C
> XHTML 2 Working Group as part of the W3C HTML Activity", it may be
> thought to follow that it is the XHTML 2 Working Group which is
> responsible both for the namespace document and for the namespace.
> But at least this reader thought it might usefully be clearer; there
> are cases where more than one group is involved.)  It might also be
> desirable to provide hyperlinks from the namespace document into the
> main documentation for the XHTML vocabulary (possibly in multiple
> versions).

Thanks for this.  We are updating the document to hopefully make this
clearer.  For avoidance of doubt, however, note that this document is
THE definition of these terms.  Also, as stated above, this is not a
namespace document.  It is an XHTML+RDFa document that provides prose
and machine readable definitions for terms defined and used in XHTML
family attribute values.  Think "ontology" or "taxonomy" - whichever
word resonates with you to mean "dictionary of terms and their mappings
to fundamental datatypes."

> (3) The namespace document also needs a reference to a namespace
> change policy.  At least, that is our reading of the following passage
> from the document "URIs for W3C Namespaces" (13 July 1005, rev. 25
> April 2006) at http://www.w3.org/2005/07/13-nsuri :
>
>     The TAG finding titled The Disposition of Names in an XML
>     Namespace explains how the use of a particular namespace may
>     evolve over time. At the W3C, it is important for a group to state
>     clearly its expectations for how use of the namespaces it controls
>     will or will not change over time. Groups SHOULD document those
>     expectations in [or clearly linked from] the Namespace Document.

We agree that the change policy should be clear.  We will include the
policy in the document itself.  We believe that the policy is that the
term collection will never get smaller, but may expand as additional
basic terms are defined through the normal evolution of the associated
specifications.

> (4) Our first concern is with the spec's reliance on CURIEs, which are
> not as well defined and not as well integrated with other XML
> technologies as one might wish.  That is perhaps a comment better
> raised against CURIEs themselves than against this specification.  It
> suffices here to notice that if the role attribute were defined as a
> list of QNames, existing XSD-based technology would provide convenient
> access to the namespace names of the individual tokens in the value of
> the 'role' attribute; this is not the case with CURIES.
>
> We note that as far as we can tell from the namespace document,
> everything currently defined in the XHTML namespace is in fact an
> NCName, so that QNames could be used in lieu of CURIES, without loss
> of functionality as regards the items in the XHTML namespace -- and,
> for software working with standard schema-aware infrastructure, some
> substantial gain in functionality.

The XHTML 2 Working Group is aware that some people in the community
have some concerns about the introduction of CURIEs as a way of
expressing compact URIs.  However, the XHTML 2 Working Group remains
convinced that compact URIs are the correct way to represent attribute
values that are to be interpreted as URIs.  QNames, while a fine
notation, have some restrictions that our constituents found
unacceptable (e.g., the requirement that the reference portion of the
QName be an NCName).  QNames are also only meaningful in the context of
XML languages - some of our constituents want to be able to use XHTML
Role in the context of HTML.  CURIEs can be readily supported in HTML.
Finally, and most importantly, QNames do not expand to URIs.  They map
into a tuple.  The interpretation of a CURIE is always a URI (an IRI 
actually, which can always be transformed to an URI).  Since the
point of XHTML Role is to define the "role" of an element in a document,
and those roles are normally defined via RDF, and RDF relationships are
defined using URIs, this direct correspondence between a role value and
its URI is ideal.

Note that the XHTML 2 Working Group, in conjunction with the Web
Accessibility Initiative and the Semantic Web Deployment Group, are
using CURIEs in several specifications for all the reasons stated
above.  Note also that the CURIE specification is a Rec-track document
that has already completed last call and will soon transition to
Candidate Recommendation.

> (5) If it's desired to provide the better validation and easier access
> to the namespace binding which would be provided by using the
> xsd:QName type, but nevertheless not to rule out the use of CURIEs
> which are not QNames, then we suggest the best way to define the role
> attribute right now would be to define (1) a union of QName and CURIE
> (in that order), and (2) a list of values from that union, and to make
> the latter the type of the role attribute.  That would ensure that
> XSD-aware software would provide access to the namespace names when
> possible, and leave the task to the application only when necessary.

We have an XML Schema definition for the datatype xh11d:CURIE that we
believe addresses this concern.  You can see that definition in the
CURIE specification at
http://www.w3.org/MarkUp/2008/ED-curie-20080617/#s_schema (or the latest
version of same).  We hope that these data type definitions address your
concern here.  Basically the Schema definition is an expansion of the
QName schema definition.

> (6) A second concern is that we are unable to locate an XSD definition
> of the datatype xh11d:CURIEs.  In general, the documentation for the
> namespace <http://www.w3.org/1999/xhtml/datatypes/> falls, we regret
> to say, somewhat short of the standard you set with your namespace
> document for <http://www.w3.org/1999/xhtml/vocab/>.
>
> Because we have not been able to find the XSD definition, we have not
> been able to evaluate the XSD implementation of the role attribute.
> What's present in appendix B.1 looks fine as far as it goes, but the
> utility of the module really depends on the definition of the datatype
> xh11d:CURIEs.
>
> If you can point us to the XSD schema document which contains the
> definition of that datatype, we will be happy to review it.

Thanks for pointing out that the datatype document is not up to scratch
- we have neglected that for quite some time.  I (Shane) have taken an
action to update it.  In the interim, please look at the datatype
definitions in XHTML Modularization
(http://www.w3.org/TR/2008/PR-xhtml-modularization-20080611/schema_module_defs.html#a_module_XHTML_Datatypes) 

- those are definitive.  Or of course at the definition for just the
CURIE datatypes using the reference already provided.

> (7) We note that unprefixed names in the value of the 'role' attribute
> effectively default to the XHTML namespace.  The first paragraph of
> section 3 says in part:
>
>     Any non-qualified value MUST be interpreted as being from the
>     XHTML vocabulary at http://www.w3.org/1999/xhtml/vocab#.
>
> We are of mixed mind about this; the longer we have thought about the
> matter, the less certain we are that we have understood just what the
> sentence is intended to say, and the more likely it seems that it
> touches on important fundamental design issues for the use of
> namespaces in XML.
>
> On the one hand, this rule seems parallel to the rule in XPath 2.0 and
> related specifications that there is a separate default namespace for
> function names, which means that a call to count(), for example, need
> not be qualified even if a default namespace in the context in which
> it occurs.  Since the XML Query and XSL Working Groups provided a
> specialized default namespace in this way, it would seem inconsistent
> to object to your making a somewhat similar rule for the role
> attribute.
>
> On the other hand, the specialized rule for the default function
> namespace was forced upon XPath 2.0 by the requirement for
> compatibility with XPath 1.0, and might well have been avoided had
> compatibility not made it necessary.  Do similar compatibility issues
> arise for the role attribute?
>
> The biggest problem is just that the existing xsd:QName datatype, and
> datatypes constructed from it in the usual ways, already provide a
> rule for deciding how to interpret unprefixed names in a context
> where namespace prefixes (e.g. as part of QNames or CURIES) can
> appear: they are assigned to the default namespace.  There is no
> general-purpose mechanism for changing the default namespace just for
> the value of a single attribute.
>
> Either the idiom you seem to be proposing is a good one, and the
> definer of an attribute or element or type should be able, as a
> general principle, to specify that what namespace bindings should
> apply, or at least what the default namespace should be, in values of
> that attribute or element or type, or else the idiom is not a good
> one, no general mechanism is needed, and you need to be persuaded that
> the idiom you seem to be proposing is not a good idea.
>
> For a mixture of technical and aesthetic reasons, we lean toward the
> latter view.  The aesthetic reasons are simple: there are already too
> many different rules for interpreting unprefixed names (in the
> default namespace, for elements and QName values; in no namespace, for
> attribute names; in the default function namespace, for function
> calls), and adding new ones will not make the world a better place.
> The technical reasons are also simple: we see no prospect of being
> able to support this kind of mechanism in the generic XML tool stack;
> it raises too many issues, and introduces too many incompatibilities
> with the existing XML infrastructure.

I think the fundamental disconnect here is that you are conflating
CURIEs and QNames.  And that is surely our fault - the XHTML Role
specification assumes a knowledge of CURIEs and what they are.  The
CURIE spec is referenced normatively, of course.  But that doesn't mean
most people will have read it nor understood it.  We will add some text
to help clarify what a CURIE is.

However, to respond to your concern:  CURIEs are not QNames.  While
CURIEs permit the (re)use of xmlns declarations to define prefix
mappings, CURIEs do not in general take advantage of the "XML Namespace"
infrastructure.  In particular, CURIEs do not recognize the concept of
the "default" XML Namespace as being any sort of a CURIE mapping at
all.  Instead, the CURIE specification indicates that grammars are
permitted to define their rules for default prefix mapping.  In XHTML
Family documents, we have declared that unqualified (unprefixed) CURIEs
must be interpreted as being from the XHTML Vocabulary - a basic set of
terms.  So to respond to your first point above, the role values do not
default to the XHTML namespace.  They default to the XHTML Vocabulary
namespace.

To get back to your point about the general idiom:  The working group
has considered a number of ways to deal with the problem of unprefixed
values.  Relying upon the default XML namespace was felt to be
inappropriate for a number of reasons - but mostly for reasons of
content portability (so-called cutting and pasting).  We also have a
strong requirement that it be possible to use basic terms without
specifying a prefix - this is associated with ease of use.  Our solution
to these two requirements was to resolve that unprefixed CURIEs *can* be
mapped to some pre-defined prefix.  In the case of the XHTML family, we
have declared that prefix to be the XHTML Vocabulary URI.

> (8) On yet another hand, we note with a mixture of relief and anxiety
> that we were obliged to speak above about "the idiom you seem to be
> proposing" -- it's not entirely clear what you are proposing, and so
> it's possible that we have misinterpreted it.
>
> Since it is usual for XHTML documents to use
> <http://www.w3.org/1999/xhtml> as the default namespace, it will
> normally be the case that unprefixed names in the value of the role
> attribute are asigned by the usual rules for interpreting namespace
> prefixes to the XHTML namespace.  If the remark
>
>     Any non-qualified value MUST be interpreted as being from the
>     XHTML vocabulary at http://www.w3.org/1999/xhtml/vocab#.
>
> refers only to this fact, then we suggest only that it be rephrased.
>
> If, on the other hand, it is intended to mean that any token in the
> value which has no namespace prefix and no colon is to be assigned to
> the XHTML namespace, then we think that this is inconsistent with the
> normal rules of namespaces (see point (7) above).

Again - CURIEs are not QNames.  They must never be interpreted as such.
A CURIE's lexical space is [ [ prefix ] : ] reference.  Its value space
is IRI.  For purposes of processing, there is no "namespace".  If you
are doing lexical space processing, then the CURIE is a token.  It won't
mean much, but if you want to look at it as a token go ahead.  In order
to do real processing on a CURIE (comparison, dereferencing, whatever)
you need to look at the value space.  In the value space, there are no
prefixes.  There are IRIs.

> (9) In point (8) we took the sentence
>
>     Any non-qualified value MUST be interpreted as being from the
>     XHTML vocabulary at http://www.w3.org/1999/xhtml/vocab#.
>
> to be referring to tokens in the value which lack namespace prefix and
> colon.
>
> Strictly speaking, though, the usual terminology for such tokens calls
> them "unprefixed", not "unqualified" -- unprefixed names are in fact
> namespace-qualified if they are assigned to the default namespace.
>
> So a third interpretation of the sentence is possible, namely that it
> is intended to be read as speaking not about unprefixed tokens but
> about unqualified tokens, and as saying about them, in effect, that if
> the normal rules for resolving namespace prefixes leave a token
> unqualified, then (since the namespace rules have NOT assigned the
> token to a namespace) an application-specific rule associated with the
> role attribute specifies that they should be interpreted as belonging
> to the XHTML namespace.
>
> This interpretation relies crucially on the subtle point that
> unqualified names are not assigned by the namespace specification to a
> magic or default or anonymous namespace, and similarly are NOT said by
> the Namespaces Rec to be in no namespace at all (although this
> paraphrase is frequently encountered); they are simply not assigned to
> a namespace by the Namespace Rec.  There seems no reason that they
> could not be assigned to a namespace by an application-level
> convention.  But we note that this area is rife with confusion, and we
> suggest that if you intend this interpretation, you explain it very
> carefully.
>
> No matter which interpretation of this passage you intend, it probably
> should be recast to make the meaning clearer.  As indicated above, the
> interpretation outlined in comment (7) would cause us grave
> misgivings; that in (8) no misgivings at all; that in (9) would give
> food for thought.
>
> In any case, we believe that this point in your design requires
> careful coordination between the XHTML Working Group and the Working
> Groups in the XML Activity, and we invite you to a dialog about the
> relevant issues.

I don't think you need to read too much into the sentence.  I have
changed the wording in the current development draft so that we do not
use the term "qualified" but instead talk about "prefixed" and
"unprefixed".  At issue here is the fundamental theory of CURIE
operation.  All of your points above seem to want to treat CURIEs as
QNames.  CURIEs are not QNames.  There are no "normal rules" for
processing such items.  There are rules as defined in the CURIE
specification.

The issue of the "default" prefix for CURIEs is a difficult one, and the
working group has debated this for many months.  In the context of RDFa,
we cheated.  I suspect that, in the interest of reducing confusion, we
should cheat here as well. In RDFa Syntax, we define a collection of
"reserved values" for @rel and @rev. We declare that there is no support
for unprefixed CURIEs, and instead define the datatype for @rel and @rev
to be ( Reserved Word | CURIE )+ or something like that.  We further say
that when encountering reserved words, they must be interpreted as being
from the XHTML Vocabulary.  I do not think that this addresses the
fundamental problem though.  That seems to be a confusion between CURIEs
and QNames.

Please review your comments in the context of CURIEs instead of in the
context of QNames and see if that helps to allay your concerns.


-- 
Shane P. McCarron                          Phone: +1 763 786-8160 x120
Managing Director                            Fax: +1 763 786-8180
ApTest Minnesota                            Inet: shane@aptest.com
Received on Wednesday, 1 October 2008 14:07:19 UTC