Re: TAG comments on: http://www.w3.org/TR/2007/WD-curie-20071126/ "CURIE Syntax 1.0" (PR#8035) from Steven Pemberton on 2008-04-25 (public-xhtml2@w3.org from April 2008)

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Fri, 25 Apr 2008 12:43:45 +0200
To: "Stuart Williams" <skw@hp.com>
Cc: xhtml2-issues@mn.aptest.com, "www-html-editor@w3.org" <www-html-editor@w3.org>, public-xhtml2@w3.org
Message-ID: <op.t95xe7xzsmjzpq@acer3010>
Hello Stuart. Thanks for the comments. This email is a reply to the
general comments only. A separate mail will address the specific points.

> With respect to your work on "CURIE Syntax 1.0" [1], the TAG has asked  
> me to post the comments attached below on its behalf.
>
> The TAG reached concensus on the comments it wished to send during their  
> meeting on 27th March 2008 (minutes to be published).
>
> I'd like to thank you for your patience in awaiting our comments.

You just got in under the wire :-) We were just about to go to last call
when your comments arrived.

One thing I should mention right at the start is that this specification
only defines a data type, and doesn't specify how that type should be
used. It is up to language designers to decide where and how to use it.

> The TAG appreciates that the XHTML 2 WG is attempting to address a  
> frequently expressed need with the CURIE design.  Aside from the  
> relatively minor comments given at the end, which we hope you can  
> address to improve the way the spec. reads, we have some overall  
> concerns which we invite you to consider.
>
> [Note that although most of these comments were written against the 22  
> January 2008 Editors' Draft [1], some were based on the public WD of
> 26 November [2], and may have been overtaken.]

True. Some of your points have already been addressed.

> 1) The spec. as it stands doesn't really make clear what the
>    requirements for CURIEs are.  What _precisely_ is the requirement
>    you are trying to address?

Hmm. The introduction tries to explain that. To summarise: Many specs use
QNames to represent URIs, but unfortunately QNames are unable to represent
all URIs, so this spec fixes that. What we shall do is try to sharpen this  
section up to make this clearer.

> 2) The overlap with existing usage of the 'xxx:yyy' pattern in
>    XML-based languages is troubling.  It would be helpful if you could
>    at least explain the background which has led you to reject all
>    suggestions that a different separator character, or XML entity
>    syntax, should be used.

It is true that there was a long discussion on this. The final decision
fell to the current syntax because a) It looks like what it is extending,
so future specs could use it without invalidating current content b)
existing mind-share of the xxx:yyy syntax, making it easy for people to
understand intuitively c) widespread existing use of the syntax in other
software, such as Wikis. But we don't think the discussion should be added  
to the spec to be honest.

> 3) The fact that you feel compelled to provide for potential confusion
>    in contexts where URIs are expected in XML languages is very
>    troubling, if we read it as implying that CURIEs are intended for
>    use in existing XML languages in places where only URIs are allowed
>    today. We can't tell whether this is actually your intention,
>    because the spec. is equivocal on this point. In section 5.2 [1]
>    the (existing) 'href' attribute of XHTML is mentioned in the prose
>    (worrying), but the _examples_ which follow only show CURIEs in the
>    (presumably proposed for XHTML2 or HTML5 or . . .) 'resource'
>    attribute (OK).

It is true that it is regrettable that there is a clash between the syntax
of QNames and URIs. However, everyone seems so used now to using
Qname-like syntax for representing URIs, especially in Semantic Web
Contexts (just look at Turtle, Sparql, and N3), that to design a
different syntax seemed like asking for trouble.

But note that CURIEs are only the syntactic space. The value space of
CURIEs is URIs. CURIES are not intended to be sent over the wire for
dereferencing.

Compare with IRIs. I can't send "http://www.élève.fr/" over the wire, but
it should be OK to allow an href to contain that, and let the computer do
the hard work of transliterating it into the necessary escapes.

>    In this connection we find the prose about CURIEs in the current
>    RDFA spec. [2] troubling. The implication that CURIEs can be used
>    in existing URI-only contexts is made explicit in one of the
>    examples therein [3]:
>
>      <link about="[_:a]" rel="foaf:knows" href="[_:b]" />

That is a different spec (which anyway doesn't contain that example  
anymore).

>    and more generally there by the fact that the DTD for XHTML+RDFa
>    defines several of its _new_ attributes, e.g. 'resource' and
>    'about', as containing URI references.

Why is that a problem? They are new attributes, without any legacy at all.

Clearly in contexts where people use URIs, for consistency (and good
engineering reasons) they will also want to be able to use CURIEs. So we  
chose a syntax that unambiguously delineates them from URIs, but still  
looks like something recognisable.

However, and this is pretty much the crux of the matter, we only provide
the datatype in this spec. How language designers use it is up to them.

But suppose a language *did* allow safe curies in attributes that allow
for dereferencing. The worst that would happen is that you would get a 404
until the software was updated to expand CURIEs. In the short term authors
would choose to use full URIs until the software could do something with
it.

>    One can imagine an alternative proposal which made clear that it
>    was only addressing the need for an abbreviated URI format in
>    non-XML languages, or new XML languages, or new contexts within old
>    XML languages, where _only_ such abbreviated forms are
>    allowed. That is, a position taken _against_ any possibility the
>    CURIEs might be used where URIs are called for in XML languages
>    today. It would though have to acknowledge the possible negative
>    consequences of success in going down this path, namely that
>    ordinary users will not understand that 'safe' CURIEs
>    ([xxx:yyy]-form CURIEs) are not a universal alternative to URIs,
>    and will start using them in existing languages where URIs are
>    expected, causing tools to break and users to be frustrated.
>    All of this adds up to saying: please consider _very_ carefully
>    whether the use cases/candidate requirements you have for the
>    'safe' CURIE, i.e. a CURIE that can be used in an XML language
>    where a URI can also be used, are really compelling. We note in
>    this regard that we are aware of no requests for an analogous
>    form for QNames.

The only reason they are in there is because we had compelling use cases
for them, which are more or less the same reasons that Turtle uses them
(well, uses QNames). You want to be able to talk about things that are
commonly written in the short form, and it is unreasonable to require an
author to expand them. You can deduce the need for them from
http://www.w3.org/TR/xhtml-rdfa-scenarios, in particular
http://www.w3.org/TR/xhtml-rdfa-scenarios/#use-case-9 .

> 4) Have you considered that if you get what you've asked for, you
>    won't have (everything) you need?  That is, have you considered
>    that being able to write xxx:37b and have that treated as
>    "http://www.example.com/feeds/thursday.xml#37b" will _not_ make
>    that a useable URI?  '37b' is not an NCName, so the URI is not a
>    valid shorthand XPointer.  '37b' is not an XML Name, so is not a
>    valid value for an ID-typed attribute, and so cannot be an anchor
>    in a valid XML document.

Sure. We spent a lot of time investigating how to define the syntax of
CURIEs so that the expansion was a valid URI, but it is provably not
possible. So all you can say is "The expansion must be a valid URI".

In fact whether a URI is valid or not is in many cases a dynamic property,
such as the example you gave, since it depends on the media type of the
returned resource. You don't know whether "#37b" is OK until you have the
media type of "http://www.example.com/feeds/thursday.xml", which itself  
may be a result of content negotiation. There is nothing wrong with the  
URI itself.

>    You may say that this is not your problem, but by allowing, even
>    encouraging, the use of CURIEs of this form, you are encouraging
>    people to deploy broken data.

I don't follow your reasoning I'm afraid.

Best wishes,

Steven Pemberton
For the XHTML2 WG
Received on Friday, 25 April 2008 10:44:23 UTC