- From: Shane McCarron <shane@aptest.com>
- Date: Sat, 30 Aug 2008 17:05:26 -0500
- To: Jonathan Rees <jar@creativecommons.org>
- CC: Ben Adida <ben@adida.net>, public-rdf-in-xhtml-tf@w3.org, Noah Mendelsohn <noah_mendelsohn@us.ibm.com>
Jonathan, Thanks for your thoughtful and thought-provoking reply. Rather than attempt to interleave my reply inline with your comments, I have tried to walk through a logical sequence below to describe where all the components in this puzzle come from and show how they fit together. This is my personal reply, and not a formal reply from the working group. However, since we are trying to get a PR out the door very very soon I wanted to try to close this loop. First, a couple of terms: /lexical space/ - the space of potential input or source values associated with something [4]. In this case, we are discussing attributes. The /lexical space/ associated with an attribute is the collection of valid literal values for its datatype. /value space/ - the collection of unique values that can be expressed via the lexical space [5]. It is possible that there are multiple values from the lexical space that map to the same value in the value space. The value space for an attribute is something that is used when processing the data, not when interpreting the source. More on this later. Assuming we agree on those terms... The next thing of interest is anyURI vs. URI vs IRI: anyURI is an XML Schema-defined datatype. The lexical space of anyURI is the complete collection of URIs as defined in RFC 3986 (previously 2396 / 2732) [1]. We only reference anyURI in the context of our lexical space, in that we use it in the example XML Schema definition of our datatypes. However, when XHTML family modules (and this is one) use a datatype of "URI", they do so as defined in [6] so to that extent we are, by normative reference, using anyURI to define the URIorSafeCURIE datatype normatively. So, for purposes of discussion, let's assume that the XHTML (and therefore RDFa) term "URI" == the XML Schema term "anyURI". IRI is defined by RFC 3987. The lexical space of IRIs is richer than that of URIs (because they allow all unicode characters basically), but there is a direct mapping from IRI to URI so that it is possible for agents that need to send IRIs over the wire to do so in a portable and backward compatible fashion. More to the point, all URIs are included in the lexical space of IRIs. Lexically, a URI (or an anyURI) is a subset of an IRI. All of the relevant standards are cited normatively by both the RDFa and CURIE specifications. Neither CURIE nor RDFa attempt to define the lexical space nor the value space for these items, as that would be inappropriate. We instead import those definitions from the relevant base specifications. What we *do* define is the relevant spaces for CURIEs. We declare the value space to be identical to that of IRIs - citing its RFC as the normative reference. We also define the lexical space for both the CURIE and SafeCURIE datatypes - in other words, the literal characters that are permitted to be used in the source form of the datatype. Sticking to the RDFa specification for the moment, since that one has the shortest fuse, this is done by declaring the datatypes in section 9.1 (Datatypes) and referencing the syntactic productions in section 7. I know this is a lot to take in, but we are pretty confident that our definition of the lexical space is complete in that we define or import all the relevant productions. As to value space, first - let's ignore the stuff in Appendix B. XML Schema definitions are for lexical space syntax checking - they are not relevant to the value space. We have asserted (in normative section 7 on CURIE syntax) that the value space of CURIEs is the same as that of IRIs. As stated above, that space is defined by the (normatively referenced) IRI RFC. With all of that in mind: You have raised a question about the value space of URIorSafeCURIE. The datatype URIorSafeCURIE has a production that says "a URI or a SafeCURIE" where both of those are already well defined. That's about the lexical space. Post processing, regardless of whether the input value were a URI or a SafeCURIE, the resulting "value space" value is an IRI. So, by definition, the value space for all possible input values of attributes with a datatype URIorSafeCURIE is IRI. In fact, for all of the datatypes defined in normative section 9 and informative Appendix B, the value space is either IRI or IRIs. As to your comment that the value space of anyURI and IRI are not the same, we disagree. We believe they are explicitly the same in the latest XML Schema Datatypes working draft [1], and even in XML Schema Datatypes 1.0 [2] since IRIs map to URIs isomorphically as defined in [3]. Since we have stated explicitly that the value space for all CURIEs is the same as that of IRIs, we are confident there is no conflict nor any potential conflict. In the CURIE specification, we could add some of the above logic if you feel it would help future readers analyze the requirements for supporting CURIEs. I do not believe that at this point modifying the RDFa Syntax specification would add any clarity. Thanks again for your comments. I hope my explanation clarifies how this works and demonstrates that our definitions are as complete as they can be without treading on the toes of other specifications that we already incorporate via normative reference. [1] http://www.w3.org/TR/2008/WD-xmlschema11-2-20080620/#anyURI [2] http://www.w3.org/TR/xmlschema-2/#anyURI [3] http://www.ietf.org/rfc/rfc3987.txt [4] http://www.w3.org/TR/xmlschema-2/#lexical-space [5] http://www.w3.org/TR/xmlschema-2/#value-space [6] http://www.w3.org/TR/xhtml-modularization/abstraction.html#dt_URI Jonathan Rees wrote: > > > Do you mean for the value space of CURIE to be different from the > value space of xsd:anyURI, as this implies? Or do you mean for them to > be the same? > >> This is exactly the same as the draft CURIE specification[2], and is >> in the same place to help ensure that the definitions are >> consistent. At the present time, we believe there are no conflicts >> between these two specifications with regard to the definition of >> CURIEs and their use. I hope that this resolves your comment in >> issue 104 to your satisfaction. >> >> To address the underlying question you seem to be posing... a CURIE >> is a syntactic short-hand for an IRI. So the value space for the two >> datatypes you reference, CURIE and URIorSafeCURIE, are exactly the >> same. The set of IRIs. > > This may be true, but as far as I can tell the CURIE draft does not > say this - and we're not talking about what's true, we're talking > about what the document should say. URIorSafeCURIE and CURIE are > completely different syntactic beasts, so if their value spaces happen > to be the same, the document needs to say this somewhere; there's no > way anyone could know this. You can't just leave it to people to draw > conclusions. > > If the value spaces of URIorSafeCURIE and xsd:anyURI are different, > that would imply that any language extension that expanded an > attribute value type from anyURI to URIorSafeCURIE would be in big > trouble, because it would result in an incompatible change in the > lexical to value space mapping. I'm no expert at this stuff but I was > under impression that the RDFa extension of XHTML was one of these > extensions. > > If the value spaces are to be the same for the three types, with > compatible mappings (i.e. the URIosSafeCURIE lexical-to-value mapping > an extension of the anyURI lexical-to-value mapping and CURIEs mapped > in the same way for both CURIE and URIorSafeCURIE), your documents > have to come out and say so, since otherwise it will be an awful mess > for anyone coming along later trying to figure it out. You can't just > say "IRI" and expect anyone to know what you mean - are these subsets > of the string type, or abstract types, or what? How is the lexical > form mapped to the value? I don't know what the value space of anyURI > is - my cynical self tells me it might not be URIs - but I think you > owe it to the rest of us to find out what it is, cite the applicable > standards (RFC whatever and/or XML Schema whatever), and take a stand > on whether there are two value spaces or one. > > I also still think you need to be much more explicit in Appendix A of > the CURIE draft, which is where I would expect the general reader to > go to look for this information. The informative XML Schema > definitions may be better than nothing (I'm not sure, if they're just > informative) but do not explain what's going on in any humanly useful > way, and while they may imply things about the value spaces and > mappings (do they? I don't know), they don't really explain where > these regular expressions come from (RFCs?) or what they mean, and as > far as I can tell they don't say anything about the mappings. > > So no, the issue is not resolved to my satisfaction. > > Jonathan > -- Shane P. McCarron Phone: +1 763 786-8160 x120 Managing Director Fax: +1 763 786-8180 ApTest Minnesota Inet: shane@aptest.com
Received on Saturday, 30 August 2008 22:14:11 UTC