Re: quick ping - ISSUE-104 from Shane McCarron on 2008-08-30 (public-rdf-in-xhtml-tf@w3.org from August 2008)

From: Shane McCarron <shane@aptest.com>
Date: Sat, 30 Aug 2008 17:05:26 -0500
To: Jonathan Rees <jar@creativecommons.org>
CC: Ben Adida <ben@adida.net>, public-rdf-in-xhtml-tf@w3.org, Noah Mendelsohn <noah_mendelsohn@us.ibm.com>
Message-ID: <48B9C426.4070606@aptest.com>
Jonathan,

Thanks for your thoughtful and thought-provoking reply. Rather than 
attempt to interleave my reply inline with your comments, I have tried 
to walk through a logical sequence below to describe where all the 
components in this puzzle come from and show how they fit together.  
This is my personal reply, and not a formal reply from the working 
group.  However, since we are trying to get a PR out the door very very 
soon I wanted to try to close this loop.

First, a couple of terms:

/lexical space/ - the space of potential input or source values 
associated with something [4].  In this case, we are discussing 
attributes.  The /lexical space/ associated with an attribute is the 
collection of valid literal values for its datatype.

/value space/ - the collection of unique values that can be expressed 
via the lexical space [5].  It is possible that there are multiple 
values from the lexical space that map to the same value in the value 
space. The value space for an attribute is something that is used when 
processing the data, not when interpreting the source. More on this later.

Assuming we agree on those terms...  The next thing of interest is 
anyURI vs. URI vs IRI:

anyURI is an XML Schema-defined datatype.  The lexical space of anyURI 
is the complete collection of URIs as defined in RFC 3986 (previously 
2396 / 2732) [1].  We only reference anyURI in the context of our 
lexical space, in that we use it in the example XML Schema definition of 
our datatypes.  However, when XHTML family modules (and this is one) use 
a datatype of "URI", they do so as defined in [6] so to that extent we 
are, by normative reference, using anyURI to define the URIorSafeCURIE 
datatype normatively. So, for purposes of discussion, let's assume that 
the XHTML (and therefore RDFa) term "URI" == the XML Schema term "anyURI".

IRI is defined by RFC 3987.  The lexical space of IRIs is richer than 
that of URIs (because they allow all unicode characters basically), but 
there is a direct mapping from IRI to URI so that it is possible for 
agents that need to send IRIs over the wire to do so in a portable and 
backward compatible fashion.  More to the point, all URIs are included 
in the lexical space of IRIs.  Lexically, a URI (or an anyURI) is a 
subset of an IRI.

All of the relevant standards are cited normatively by both the RDFa and 
CURIE specifications.  Neither CURIE nor RDFa attempt to define the 
lexical space nor the value space for these items, as that would be 
inappropriate.  We instead import those definitions from the relevant 
base specifications.

What we *do* define is the relevant spaces for CURIEs. We declare the 
value space to be identical to that of IRIs - citing its RFC as the 
normative reference.  We also define the lexical space for both the 
CURIE and SafeCURIE datatypes - in other words, the literal characters 
that are permitted to be used in the source form of the datatype.  
Sticking to the RDFa specification for the moment, since that one has 
the shortest fuse, this is done by declaring the datatypes in section 
9.1 (Datatypes) and referencing the syntactic productions in section 7. 

I know this is a lot to take in, but we are pretty confident that our 
definition of the lexical space is complete in that we define or import 
all the relevant productions.

As to value space, first - let's ignore the stuff in Appendix B.  XML 
Schema definitions are for lexical space syntax checking - they are not 
relevant to the value space.  We have asserted (in normative section 7 
on CURIE syntax) that the value space of CURIEs is the same as that of 
IRIs.  As stated above, that space is defined by the (normatively 
referenced) IRI RFC.

With all of that in mind: You have raised a question about the value 
space of URIorSafeCURIE. The datatype URIorSafeCURIE has a production 
that says "a URI or a SafeCURIE"  where both of those are already well 
defined.  That's about the lexical space.  Post processing, regardless 
of whether the input value were a URI or a SafeCURIE, the resulting 
"value space" value is an IRI.  So, by definition, the value space for 
all possible input values of attributes with a datatype URIorSafeCURIE 
is IRI.  In fact, for all of the datatypes defined in normative section 
9 and informative Appendix B, the value space is either IRI or IRIs. 

As to your comment that the value space of anyURI and IRI are not the 
same, we disagree.  We believe they are explicitly the same in the 
latest XML Schema Datatypes working draft [1], and even in XML Schema 
Datatypes 1.0 [2] since IRIs map to URIs isomorphically as defined in 
[3].  Since we have stated explicitly that the value space for all 
CURIEs is the same as that of IRIs, we are confident there is no 
conflict nor any potential conflict.

In the CURIE specification, we could add some of the above logic if you 
feel it would help future readers analyze the requirements for 
supporting CURIEs.  I do not believe that at this point modifying the 
RDFa Syntax specification would add any clarity.

Thanks again for your comments.  I hope my explanation clarifies how 
this works and demonstrates that our definitions are as complete as they 
can be without treading on the toes of other specifications that we 
already incorporate via normative reference.


[1] http://www.w3.org/TR/2008/WD-xmlschema11-2-20080620/#anyURI
[2] http://www.w3.org/TR/xmlschema-2/#anyURI
[3] http://www.ietf.org/rfc/rfc3987.txt
[4] http://www.w3.org/TR/xmlschema-2/#lexical-space
[5] http://www.w3.org/TR/xmlschema-2/#value-space
[6] http://www.w3.org/TR/xhtml-modularization/abstraction.html#dt_URI

Jonathan Rees wrote:
>
>
> Do you mean for the value space of CURIE to be different from the 
> value space of xsd:anyURI, as this implies? Or do you mean for them to 
> be the same?
>
>> This is exactly the same as the draft CURIE specification[2], and is 
>> in the same place to help ensure that the definitions are 
>> consistent.  At the present time, we believe there are no conflicts 
>> between these two specifications with regard to the definition of 
>> CURIEs and their use.  I hope that this resolves your comment in 
>> issue 104 to your satisfaction.
>>
>> To address the underlying question you seem to be posing...  a CURIE 
>> is a syntactic short-hand for an IRI.  So the value space for the two 
>> datatypes you reference, CURIE and URIorSafeCURIE, are exactly the 
>> same.  The set of IRIs.
>
> This may be true, but as far as I can tell the CURIE draft does not 
> say this - and we're not talking about what's true, we're talking 
> about what the document should say. URIorSafeCURIE and CURIE are 
> completely different syntactic beasts, so if their value spaces happen 
> to be the same, the document needs to say this somewhere; there's no 
> way anyone could know this. You can't just leave it to people to draw 
> conclusions.
>
> If the value spaces of URIorSafeCURIE and xsd:anyURI are different, 
> that would imply that any language extension that expanded an 
> attribute value type from anyURI to URIorSafeCURIE would be in big 
> trouble, because it would result in an incompatible change in the 
> lexical to value space mapping. I'm no expert at this stuff but I was 
> under impression that the RDFa extension of XHTML was one of these 
> extensions.
>
> If the value spaces are to be the same for the three types, with 
> compatible mappings (i.e. the URIosSafeCURIE lexical-to-value mapping 
> an extension of the anyURI lexical-to-value mapping and CURIEs mapped 
> in the same way for both CURIE and URIorSafeCURIE), your documents 
> have to come out and say so, since otherwise it will be an awful mess 
> for anyone coming along later trying to figure it out. You can't just 
> say "IRI" and expect anyone to know what you mean - are these subsets 
> of the string type, or abstract types, or what? How is the lexical 
> form mapped to the value? I don't know what the value space of anyURI 
> is - my cynical self tells me it might not be URIs - but I think you 
> owe it to the rest of us to find out what it is, cite the applicable 
> standards (RFC whatever and/or XML Schema whatever), and take a stand 
> on whether there are two value spaces or one.
>
> I also still think you need to be much more explicit in Appendix A of 
> the CURIE draft, which is where I would expect the general reader to 
> go to look for this information. The informative XML Schema 
> definitions may be better than nothing (I'm not sure, if they're just 
> informative) but do not explain what's going on in any humanly useful 
> way, and while they may imply things about the value spaces and 
> mappings (do they? I don't know), they don't really explain where 
> these regular expressions come from (RFCs?) or what they mean, and as 
> far as I can tell they don't say anything about the mappings.
>
> So no, the issue is not resolved to my satisfaction.
>
> Jonathan
>

-- 
Shane P. McCarron                          Phone: +1 763 786-8160 x120
Managing Director                            Fax: +1 763 786-8180
ApTest Minnesota                            Inet: shane@aptest.com
Received on Saturday, 30 August 2008 22:14:11 UTC