W3C TAG Response to CURIE Last Call (PR#8055) from noah_mendelsohn@us.ibm.com on 2008-10-07 (public-xhtml2@w3.org from October 2008)

From: <noah_mendelsohn@us.ibm.com>
Date: Tue, 7 Oct 2008 08:26:56 -0500
To: public-xhtml2@w3.org
CC: xhtml2-issues@mn.aptest.com
Message-Id: <200810071326.m97DQuUd024468@htmlwg.mn.aptest.com>
This note conveys some comments from the TAG's review of your 6 May 2008 
working draft titled:  "CURIE Syntax 1.0" [1].  First of all, we would 
like to make clear that we are overall supportive of the publication of 
this work, and we do not anticipate that dealing with any of the following 
concerns should greatly slow your progress. We do, however, very much 
appreciate your consideration of them.

* The introduction contains the statement:

<current>
"Unfortunately, QNames are unsuitable in most cases because 1) they are 
NOT intended for use in attribute values, and 2) ...".
</current>

Whether or not they were originally intended for such use, QNames are 
routinely used in attribute values, e.g. in XML Schema Documents, where 
their use is required.  We suggest that a better explanation might be 
along the lines of:

<proposed>
"Unfortunately, QNames are unsuitable in most cases because 1) the use of 
QName as identifiers in attribute values and element content is 
problematic as discussed in [2], and 2) ..."
</proposed>

* The TAG has decided to formally (re)raise a concern that I raised 
privately in a note sent in early August [3], and which the TAG itself 
raised in an in earlier round of comments [4].  The concern remains that 
it is inappropriate to allow for use of new CURIE or safe_CURIE syntax in 
languages for which the specifications do not allow for it.  Similarly, it 
is inappropriate to interpret existing syntax (e.g. pref:xxx) as a CURIE 
in cases where the specifications require it be interpreted as a URI. 
Accordingly, we suggest that the text that currently reads:

<current>
"In some cases language designers will want to use both URIs and CURIEs as 
the value of an attribute. For example, in XHTML+RDFa [XHTMLRDFa] the 
about attribute allows a URI to be specified that some metadata is 
"about", but it is also be useful to abbreviate this URI, using the 
compact syntax. However, the problem is that it is not possible for the 
language parser to be completely sure whether it has located a CURIE or a 
URI. For example, a resource could be specified as follows:

        <p rel="foaf:homePage" about="http://www.example.org/home.html
">home</p>

There is no way to be sure that this is a normal URI, or a CURIE. 
Therefore the syntax for carrying a CURIE when there is any possibility of 
ambiguity is to enclose the CURIE in square brackets [...]
</current>

Be replaced with:

<proposed>
CURIEs and safe_CURIEs map to IRIs, but neither a CURIE nor a safe_CURIE 
<italic>is</italic> an IRI or URI.  Accordingly, CURIEs and safe_CURIEs 
MUST NOT be used as values for attributes or other content that are 
specified to contain only URIs, IRIs, URI-references, IRI-references, etc. 
  Specifications for particular attribute values or other content MAY be 
written to allow either CURIEs or IRIs (or URIs, etc.).  The 
specifications for such languages MUST provide rules for disambiguantion 
in situations where the same string could be interpreted as either a CURIE 
or an IRI.  One way to do this is to require that all CURIEs be expressed 
as safe_CURIEs, implying that all unbracketed strings are to be 
interpreted directly as IRIs.
</proposed>

* In the introduction, the term "value space" is used in a quite general 
manner to refer to a set of values that are grouped together and thus 
distinct from similar values in other groups. Later, in the syntax 
section, the statement is made: "Note that while the set of IRIs 
represents the lexical space of a CURIE, the value space is the set of 
URIs (IRIs after canonicalization - see [IRI])."  This seems to appeal, 
without reference, to notions intended to be either similar in spirit to, 
or exactly the same as, the similarly named concepts defined for XML 
Schema Datatypes [5,6].  We suggest that, first of all, the inconsistency 
in usage between the Introduction and the Syntax section should be 
resolved.  Secondly, the syntax section should be clearer on whether there 
is an assumption that an XML Schema Datatype for CURIE is being defined 
(as it is eventually in the Appendix), in which case the terms "lexical 
space" and "value space" should probably be made hyperlinks to the XSD 
Recommendation.  If there is no specific assumption of an XSD Datatype in 
the syntax section, then the terms lexical space and value space should 
either be dropped from this section, or clarified.  We would expect that, 
if the terms lexical and value space are retained in this section, the 
lexical space would be the set of strings conforming to the BNF for CURIE, 
SafeCURIE, etc.  If so, those correspondences should be made clear. 

Looking ahead to Appendix A, the types you define there are subtypes of 
xsd:string.  For those types, the correspondence between lexical and value 
space is of necessity 1:1 (I.e., as required by the XSD Recommendation), 
and thus the value space is also of the form pref:xxxxx.  In any case, the 
whole story about datatypes, lexical, and value spaces, needs to be 
clarified, and needs to be made more consistent with XSD where 
appropriate.  On balance we suggest you retain the definitions in Appendix 
A (with the corrections given below), but replace the word/phrase 'value' 
and "value space" in the Introduction with 'name' and "name collection" 
respectively.

* There is a related, and serious, problem in section 3.  The sentence:

<current>
Note that while the set of IRIs represents the lexical space of a CURIE, 
the value space is the set of URIs (IRIs after canonicalization - see 
[IRI])."
</current>

is wrong on two counts, even after we decouple the terminology from XML 
Schema's usage:

      1) The 'lexical space' is a subset of strings,
         as specified by the BNF at the top of
         section 3 (after correction).

      2) The 'value space' is strings (intended for use in)
         representing IRIs.

So, and given the recommendation below as well, we suggest you replace the 
paragraph containing the above sentence with something along the following 
lines:

<proposed>
"CURIEs are an abbreviation for strings which are >intended< to represent 
IRIs (see [IRI]), but >checking that intent is not part of CURIE 
conformance<.  The intended IRI is constructed by concatenating the prefix 
binding with the reference part, if any.  There MUST be a prefix binding 
for the prefix (or the default prefix, if the prefix is absent) in scope."
</proposed>

Care should be taken to check throughout that the word 'CURIE' is always 
used to refer to strings of the form [prefix :] reference. If a name is 
needed for the IRI which this maps to, perhaps a phrase such as "expanded 
CURIE" should be used, paralleling the term "expanded name" from XML 
namespaces; we are unsure as to whether there is, on balance, a need for 
such a term.

* Section 3 says:

<current>
"A CURIE processor that encounters a value that does not conform the 
constraints defined by this specification and by the host language SHOULD 
ignore that value. A host language MAY require other behavior."
</current>

This seems to make unwarranted assumptions about the host languages, 
whether each such language in fact has a notion of "ignoring" content, and 
if so, whether that is in fact the most appropriate error handling 
strategy.  Accordingly, we recommend instead:

<proposed>
"It is an error if a string required by a host language to be a CURIE or 
SafeCURIE fails to satisfy the constraints defined above.  Error handling 
is implementation-defined."  Or, if you prefer, replace that last sentence 
with "Rules for error reporting and/or recovery should be provided in the 
specification for the host language."
</proposed>

The following comments apply to Appendix A, which defines XML Schema 
Datatypes relating to CURIEs:

* The status of Appendix A needs to be clarified -- it's currently 
described as normative, but at the very least the list of types needs 
cross-referencing to the BNF for CURIE and SafeCURIE.

*  The syntax in section 3 and the regexps in Appendix A need to be 
brought into line.  We recommend that this might be done by:

     a) Changing the CURIE production to read:

         curie := [ prefix ':' ] reference

        with a bit of prose saying that the empty
        string is _not_  a CURIE.

     b) Changing the core part of the regexps to read:

         ([\i-[:]][\c-[:]]*:)?.*

     c) Adding a facet to CURIE:

         <xs:minLength value="1"/>

     d) Adding a facet to SafeCURIE:

         <xs:minLength value="3"/>

Thank you again for your consideration of these comments.

Noah Mendelsohn
for the W3C Technical Architecture Group


[1] http://www.w3.org/TR/2008/WD-curie-20080506/
[2] http://www.w3.org/2001/tag/doc/qnameids.html
[3] http://lists.w3.org/Archives/Public/www-tag/2008Aug/0006.html
[4] 
http://lists.w3.org/Archives/Public/www-html-editor/2008JanMar/0014.html
[5] http://www.w3.org/TR/2004/PER-xmlschema-2-20040318/#dt-lexical-space
[6] http://www.w3.org/TR/2004/PER-xmlschema-2-20040318/#dt-value-space

P.S. Tracker, this relates to ACTION-170

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Received on Tuesday, 7 October 2008 13:28:10 UTC